We need an objective measure of quality to see the impact of various code changes. A decent measure would be precision/recall/F-measure combo obtained by comparing segmenter output to gold standard data.
To do this, we need the gold standard data, though. Someone has to create them by manually annotating morph boundaries in a piece of text.
If we obtain a larger amount of data, we could also use them to learn how to segment – for example, probabilities of various phonemic changes could be extracted from them.
We need an objective measure of quality to see the impact of various code changes. A decent measure would be precision/recall/F-measure combo obtained by comparing segmenter output to gold standard data.
To do this, we need the gold standard data, though. Someone has to create them by manually annotating morph boundaries in a piece of text.
If we obtain a larger amount of data, we could also use them to learn how to segment – for example, probabilities of various phonemic changes could be extracted from them.