Skip to content

Encoding issue with mimic3-server (Latin-1 vs UTF-8)  #52

@sulivanShu

Description

@sulivanShu

Hi!

I have an encoding issue with mimic3-server:

$ mimic3 --remote --voice 'en_UK/apope_low' "I don’t speak English" | aplay --quiet
Reading text from stdin...
Traceback (most recent call last):
  File "mimic3.py", line 40, in <module>
  File "mimic3_tts/__main__.py", line 129, in main
  File "mimic3_tts/__main__.py", line 450, in process_lines
  File "mimic3_tts/__main__.py", line 397, in process_line
  File "mimic3_tts/__main__.py", line 587, in get_remote_wav_bytes
  File "requests/api.py", line 115, in post
  File "requests/api.py", line 59, in request
  File "requests/sessions.py", line 587, in request
  File "requests/sessions.py", line 701, in send
  File "requests/adapters.py", line 489, in send
  File "urllib3/connectionpool.py", line 703, in urlopen
  File "urllib3/connectionpool.py", line 398, in _make_request
  File "urllib3/connection.py", line 239, in request
  File "http/client.py", line 1255, in request
  File "http/client.py", line 1300, in _send_request
  File "http/client.py", line 164, in _encode
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2019' in position 5: Body ('’') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.
[582387] Failed to execute script 'mimic3' due to unhandled exception!
aplay: read_header:2931: erreur de lecture

However, there is no issue with mimic3:

$ mimic3 --voice 'en_UK/apope_low' "I don’t speak English" | aplay --quiet
Reading text from stdin...
INFO:mimic3_tts.tts:Loaded voice from /usr/share/mycroft/mimic3/voices/en_UK/apope_low

The error message states: “Use body.encode(‘utf-8’) if you want to send it encoded in UTF-8.” but I don’t know how do this. I simply run the server with the command:

$ mimic3-server --num-threads 6

I couldn’t find the option to tell the server that the input is utf-8 encoded. Here the versions of mimic3 and mimic3-server:

$ mimic3 --version
0.2.3
$ mimic3-server --version
0.1.1

Here are my locales and system:

$ env | grep LANG
LANG=fr_FR.utf8
GDM_LANG=fr_FR.utf8
$ lsb_release -a
LSB Version:    n/a
Distributor ID: Manjaro-ARM
Description:    Manjaro ARM Linux
Release:        23.02
Codename:       n/a

Here is a tip to get around the issue:

echo "I don’t speak English" | iconv -f UTF-8 -t ISO-8859-1//TRANSLIT | mimic3 --remote --voice 'en_UK/apope_low' | aplay --quiet

This converts UTF-8 strings to ISO-8859-1 (i.e. Latin-1) while attempting to transcribe unrecognized characters, like "’".

I think this is a bug, because mimic3-server should accept UTF-8 encoding, as mimic3 does without problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions