-
-
Notifications
You must be signed in to change notification settings - Fork 78
Description
Description
HoneyBadger alert:
Excon::Error::Socket: Mysql2::Error: Incorrect string value: '\xF0\x9F\x94\x8D G...' for column 'text' at row 1 (ActiveRecord::StatementInvalid)
Backtrace:
line 63 of [PROJECT_ROOT]/app/lib/morph/runner.rb: log
line 45 of [PROJECT_ROOT]/app/lib/morph/runner.rb: block in synch_and_go_with_logging!
line 194 of [PROJECT_ROOT]/app/lib/morph/docker_runner.rb: block in attach_to_run
Error is in code that is logging output from a scraper - it has a 4 byte UTF-8 character, confirmed:
morph (main)$ hd ,utf
00000000 f0 9f 94 8d 20 47 2e 2e 2e 0a |.... G....|
0000000a
morph (main)$ cat ,utf
π G...
Describe the solution you'd like
Update the database to full unicode.
- Update the default encoding and collation character set in the app
- create a new database with the new config
- update the app to use the new database name
- deploy (site goes down)
- migrate the data from old to new database with new config
- remove maintenance page file
- check site
- remove old database
Describe alternatives you've considered
Remove the 4 byte emoji from scraper_utils, assuming its the only place the issue is.
Update the scrapers that use it.
This is probably the right time to move the repo across for scraper_utils as well.
Additional context
This is typical for databases that came from MySql 5.2: In MySQL version 5.2, utf8mb3 was the default character set for new installations, while utf8mb4 was introduced later as an option in MySQL 5.5.
It is possibly from a debug message from
scraper_utils/lib/scraper_utils/debug_utils.rb
64: LogUtils.log "π #{http_method.upcase} #{url}"
I confirmed the mysql database doesn't currently support 4 byte UTF-8:
create_table "log_lines", id: :integer, options: "ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_unicode_ci", force: :cascade do |t|
Related doc: https://dev.mysql.com/doc/refman/8.4/en/charset-unicode-conversion.html