We want to add corpus coverage as ab evaluation for our language models, this is perhaps the best evaluation measure.
have
I suggest doing the following, for fsts
- Count the number of words or tokens (probably tokens) in the open corpus (evt. both open and closed, but open is more transparent)
- Run them through the descriptive analyser
- In the table under Language Models, add the coverage as percentage
- in the documentation page (next to the map) add the same percentage, and also the size of the corpus
The table under Language Models is crowded, I suggest to save space, as follows:
- to remove the version number column from the table and only keep it next to the map, not in the table
- If needed: publish the corpus coverage in the table only as percenage, not as percentage and number of šord for corpus.
- On the page for each language there is more space, here we may have both percentage and number.
We want to add corpus coverage as ab evaluation for our language models, this is perhaps the best evaluation measure.
have
I suggest doing the following, for fsts
The table under Language Models is crowded, I suggest to save space, as follows: