Add language translations to card templates#725
Conversation
Automatically generate the card templates for each language translation. Uses a base template and a translation.csv file to generate the anki card templates.
Run the generator in a seperate commit not to clutter the previous one.
|
Wow! Thanks very much for looking into this! This looks great! (The mass of new, generated files is messy, but inevitable.) The built CrowdAnki files (in Unfortunately, we also need to have the On a cursory look, other than the id issue I don't see anything crucial (and given that the
Yes, makes sense!
Keeping the experimental note models untranslated, (at least for now?) is IMO fine. |
Each deck template now gets its own uuid, CR from aplaice.
|
Thank you! I generated a bunch of UUIDs for the different languages, and tested it on my Anki and it works. The only thing to keep in mind is that if you're like me and already have a deck, you'll get a new notetype with some suffix since the last uuid was the english (and only) one. But you can rename it if you go to Tools > Manage Note Types. |
|
I've opened a PR to your fork adding the remaining translations. Unfortunately, it was non-trivial, especially for "flag similar to", because our current flag similarity field is in the nominative case (for languages that have cases), without any prepositions, so often the most natural translation didn't work. I've worked around this for some of the translations, by slightly rephrasing and using a colon. I've also added a "full stop" field to handle the different full stop in Chinese. It would be useful, though, if these were checked by our translators. TBH, though, I feel awkward pinging a dozen people for a single line each... Unfortunately, we don't have a good process for incrementally having our translations verified.
I think that this only occurred because you already had a notetype called Nitpicks
Maybe we should rename the current/"base" Ultimate Geography note templates to include The raw text replacement system feels brittle (it relies on the tokens not occurring anywhere else in the templates and not being subsets of each other). OTOH it's simple and works (given the full regeneration of the English decks). Probably, if we ever expand the templates, it might need replacing, but for now I think it's OK. Should we split up the generated templates in Thanks again! |
Awesome! I'll reply about the stuff relating to the changes in that PR thread, especially about the "flag similar to" case, I think there's a nice way to solve it.
Do you guys have a discord or something?
I agree with this completely, but I didn't want to be the one to take a decision about that since I'm not a maintainer.
One fast and easy improvement would be that the python script actually checks the tags to make sure tokens not being subsets of eachother - if you think that sounds like a good idea I can add it.
You're right, I can fix that tomorrow! |
No, we handle everything here.
Yes, good idea! It should be easy to implement and might save someone some time in the future if/when we expand the template and aren't careful... One more thing that I forgot, is that I think it'd be cleaner for the script to be in |
For ease of comparison this doesn't yet change the templates themselves for better handling of colons.
This is primarily for the Chinese colon and full stop.
Also, change the error type.
This causes {{Flag similarity}} to be repeated, but is clearer.
The generated templates are unchanged.
And move the script to utils
Done! I merged all your changes and fixed the things in the thread - see what you think. |
aplaice
left a comment
There was a problem hiding this comment.
Thanks very much for the update! I also greatly appreciate the split into self-contained commits!
utils/generate_templates.py
Outdated
| fieldnames, rows = load_translations(translations_path) | ||
| base_files = [path for path in base_dir.iterdir()] | ||
| output_dir.mkdir(parents=True, exist_ok=True) | ||
| english_base_case = [ |
There was a problem hiding this comment.
Thanks for pro-actively changing this, but unfortunately we need to restore your fallback code for the English case!
Sorry, this my fault for being unclear — my question about renaming the English note models had been intended as the start of a discussion rather than an immediate suggestion.
I've now tested import of the deck with a changed note model name (but same uuid) with:
- CrowdAnki,
- an APKG directly into Anki,
Unfortunately, while 1, as expected, works seamlessly (i.e. without a migration pop-up), 2 unfortunately doesn't — you can get it to work (by choosing to merge the note models, as an option before running the import), but it's not quite straightforward.
Hence, given that renaming the note models is only a very mild benefit to people who have several language versions in parallel, and little-no-benefit for people who only have the English deck, but at the very least a minor annoyance for users of AUG on AnkiWeb, I think we should keep the old names for the English note models, for now. (We might end up with some "breaking change" at some point and we can do the rename then.)
(We could just rename the extended version, since it's not imported via APGK/AnkiWeb anyway, but IMO that'd be just confusing.)
There was a problem hiding this comment.
Sorry I got too ahead of myself there, we should probably open a seperate issue about it
| @@ -2,18 +2,28 @@ | |||
| from pathlib import Path | |||
|
|
|||
|
|
|||
| def validate_subtokens(tokens): | |||
There was a problem hiding this comment.
Thanks for adding the validation, but I don't think it works in all cases — it'll only check whether one is a prefix of another, but not whether one is a subset of the other. (e.g. it wouldn't catch _location_translation and _capital_location_translation.)
Maybe something like:
length_sorted_tokens = sorted(tokens, key=len)
for i, a in enumerate(length_sorted_tokens):
for b in length_sorted_tokens[i + 1:]:
if b.find(a) >= 0:
raise ValueError(f"Translation key '{a}' is a subset of key '{b}'!")There was a problem hiding this comment.
You're right, I blame my tired brain
And move them to their respective folders
aplaice
left a comment
There was a problem hiding this comment.
Thanks for the changes! I'll leave this open for a while in case anybody has any other thoughts.
Regarding checking the translations, we'll probably have them verified later in a batch. (In the worst case scenario users will complain :))
There was a problem hiding this comment.
This looks really cool, thanks!! I've only had a quick glance at the discussions you've had until this point, so apologies if I suggest something that you've already discussed:
- I would strongly recommend adding the
generatedfolder to.gitignore. - Building the decks now requires running the new
generate_templatesscript, sopipenv run build(andbuild_experimental) should be updated to run this new script first. - The
generate_templatesscript should also be mentioned somehow in the Getting started section of theCONTRIBUTINGguide. - It might be worth documenting this new templates translation set-up a little bit somewhere in the
CONTRIBUTINGguide as well. - Very minor: in
translation.csv, I would maybe put the_deck_idand_deck_extended_idcolumns in second and third position respectively (right after the language code), since they have fixed-length values. I would also renamelanguage-tagtolanguage_tagfor consistency. - Very minor also: for the tokens in the base templates, like
_capital_translation, would it be possible to make them a bit more distinctive, like%%CAPITAL_TRANSLATION%%? The column headers intranslation.csvcan still be lower-case and without the special prefix/suffix (e.g.capital_translation) — the conversion fromcapital_translationto%%CAPITAL_TRANSLATION%%could be done dynamically by your script.
Since pipenv doesn't have a good cross-platform way of running multiple commands in sequence (such as simply &&), we'll have generate_and_build.py be our main entrypoint.
And delete them from git.
|
Thanks for the feedback! The main thing I want to comment on is that I tried the naive way of just changing build to: And then I found out that pipenv is really bad at chaining commands in a cross-platform way. Even though generate_templates would return 0, brain_brew wouldn't run - and we still want to run it only when we successfully generate the templates. An option would be to wrap it in a call to bash, but that wouldn't work on Windows for example. What I thought was a better idea was to rename the script to generate_and_build.py and have it generate the templates and make calls to brain_brew. It also has the benefit of being a lot more extendable for anything we want to add in the future. I thought l'd wait with changing the documentation until you guys approved of this idea, since that also had to be added to the docs. Regarding the tokens I hope you think it's good with the __DUNDER_AND_CAPS__, I think it looks a lot better now - and I reorganized the columns but kept the rows in the order the languages were added in, since that's how they show up everywhere else in the codebase. |
|
Awesome, thanks a lot for the updates! 😊 No worries about the Happy with |
Sorry for being unclear, I already changed the build script, it's in this commit 0c1347c, if you don't like it I can undo it but I think it works pretty well now. Another thing I should mention is that if we don't have a build script that does both steps, then we need to track the generated files in git and couldn't .gitignore them - otherwise the Github actions runner would fail the build. |
Oh great, then! |
Write them in caps and with __dunder__. Also reorganize the columns.
And update the section about adding a new translation, as well as how the builds are generated.
I fixed the last step now which was to add all the new info about generating templates, building, and adding a new language to CONTRIBUTING. When reading through it I realized the proper name for the lanugage tags should be language codes, so I amended that commit. Other than that I think that's all for this PR unless there's more feedback from the maintainers or I missed something. |
|
Thanks very much for the changes and for the updated docs! The dunder caps are indeed much clearer! I don't have any other comments! |
Fix #724 and discussion #709 (reply in thread).
This commit adds language translations to card templates to match the language you're using.
It works by adding the translations to src/note_models/translations.csv and then generating the template files from the base template with src/note_models/generate_templates.py. Right now I only have Swedish and English, but we should probably add more translations before if we want to merge this.
I tested the builds and everything works, although there aren't any visible changes to lanugages not in the csv file, I also didn't touch the experimental builds, they are not affected.