Skip to content

Scraping does not seem to use Unicode/UTF-8, Japanese gets garbled #2318

@vermeeren

Description

@vermeeren

Describe the bug

Bookmark scraping appears to not use unicode, resulting in garbled characters with for example Japanese.

The PHP default_charset is set to UTF-8, Debian defaults. Japanese works fine with file syncing and in other parts of Nextcloud.

To Reproduce

Add bookmark https://www.youtube.com/watch?v=OoM0ikOi1v4 with web scraping turned on.

�Blender�������Blender����座��簡������������������������

Seems like the values are inserted into the database garbled.

# from psql command line
nextcloud=# select title from oc_bookmarks;

ã\u0080\u0090Blenderã\u0080\u0091å\u0088\u009Då¿\u0083è\u0080\u0085å\u0090\u0091ã\u0081\u0091ï¼\u0081Blenderè¶\u0085å\u0085¥é\u0096\u0080è¬\u009B座ã\u0080\u0080ï½\u009Eç°¡å\u008D\u0098ã\u0081ªã\u0082»ã\u0083«ã\u0083«ã\u0083\u0083ã\u0082¯ã\u0081®ã\u0081\u0086ã\u0081\u0095ã\u0081\u008Eã\u0081®ã\u0082­ã\u0083£ã\u0083©ã\u0082¯ã\u0082¿ã\u0083¼ã\u0082\u0092ä½\u009Cã\u0082\u008Dã\u0081\u0086ï¼\u0081ï½\u009E

PostgreSQL database using UTF8 for encoding and en_US.UTF-8 for collate and ctype.

Expected behavior

【Blender】初心者向け!Blender超入門講座 ~簡単なセルルックのうさぎのキャラクターを作ろう!~ 

Screenshots

Render from the bookmarks UI in firefox.

Image

Desktop (please complete the following information):

  • OS: Debian Linux
  • Browser: Firefox
  • Version: ESR 128

Server (please complete the following information):

Additional context

Web server error log

Nothing shows up in logs.

Nextcloud log (nextcloud/data/nextcloud.log)

Nothing shows up in logs.

Browser log

Not sure about this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions