New metrics for Language models: corpus coverage

We want to add corpus coverage as ab evaluation for our language models, this is perhaps the best evaluation measure.
have
I suggest doing the following, for  fsts

1. Count the number of words or tokens (probably tokens) in the open corpus (evt. both open and closed, but open is more transparent)
2. Run them through the descriptive analyser
3. In the table under Language Models, add the coverage as percentage
4. in the documentation page (next to the map) add the same percentage, and also the size of the corpus

The table under Language Models is crowded, I suggest to save space, as follows:

1. to remove the version number column from the table and only keep it next to the map, not in the table
2. If needed: publish the corpus coverage in the table only as percenage, not as percentage and number of šord for corpus.
3. On the page for each language there is more space, here we may have both percentage and number.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New metrics for Language models: corpus coverage #44

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

New metrics for Language models: corpus coverage #44

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions