Incorrectly encoding text when backFormat is text

When performing backtranslation with file2brl, if the configuration has backFormat set to text and the text resulting from the backtranslation contains unicode characters outside the ASCII range these will be incorrectly encoded.
As an example, using en-ueb-g2.ctb as the translation table try back translating a word containing an apostrophe (eg. I'M, CAN'T, etc). This results in the apostrophe being produced as the byte 0x19.
Having tested file2brl with backFormat set to html, it appears that in this example the apostrophe gets backtranslated to unicode character \u2019. I therefore suspect file2brl is simply removing the higher byte of the unicode characters when backFormat is set to text.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrectly encoding text when backFormat is text #68

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incorrectly encoding text when backFormat is text #68

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions