Skip to content

Fix emoji_keywords not being generated in SQLite export#684

Open
LJSigersmith wants to merge 1 commit intoscribe-org:mainfrom
LJSigersmith:fix/emoji-keywords-sqlite-generation
Open

Fix emoji_keywords not being generated in SQLite export#684
LJSigersmith wants to merge 1 commit intoscribe-org:mainfrom
LJSigersmith:fix/emoji-keywords-sqlite-generation

Conversation

@LJSigersmith
Copy link
Copy Markdown

Excludes emoji_keywords from the standard data type branch, which expected dict values but emoji_keywords JSON contains lists. Also caps the emoji loop with min() to avoid IndexError when a word has fewer than 3 emojis.

Fixes #683

Contributor checklist


Description

This fixes a bug where emoji_keywords data was not being written to SQLite databases during the convert process.

Root cause: emoji_keywords was falling into the standard data type processing branch in data_to_sqlite.py, which calls .keys() on each JSON entry assuming it's a dict, like nouns/verbs:

{
"L1000688": {
"lastModified": "2025-11-20T04:16:52Z",
"plural": "anthoxanthins",
"singular": "anthoxanthin",
"noun": "anthoxanthin"
},
...

However, emoji_keywords JSON values are lists of dicts:

    "face": [
        {
            "emoji": "😂",
            "is_base": false,
            "rank": 1
        },
        {
            "emoji": "🤣",
            "is_base": false,
            "rank": 3
        },
...

causing an AttributeError before any table was created.

Changes in src/scribe_data/load/data_to_sqlite.py:

  • Excluded emoji_keywords from the standard branch so it correctly falls through to its own elif handler
  • Fixed the emoji loop in that elif branch to use min(len(json_data[row]), len(cols) - 1) — previously range(len(json_data[row])) would raise an IndexError for any word with fewer than 3 emojis

Testing: Ran the full test suite (pytest) and confirmed all 378 tests pass. Manually verified that emoji_keywords tables are now generated in the SQLite output.

Screenshot 2026-04-10 at 12 39 40 PM

Related issue

Closes #683

Excludes emoji_keywords from the standard data type branch, which
expected dict values but emoji_keywords JSON contains lists. Also
caps the emoji loop with min() to avoid IndexError when a word
has fewer than 3 emojis.

Fixes scribe-org#683
@github-actions
Copy link
Copy Markdown
Contributor

Thank you for the pull request! 💙🩵

The Scribe-Data team will do our best to address your contribution as soon as we can. The following are some important points:

  • Those interested in developing their skills and expanding their role in the community should read the mentorship and growth section of the contribution guide
  • If you're not already a member of our public Matrix community, please consider joining!
    • We'd suggest that you use the Element client as well as Element X for a mobile app
    • Join the General and Data rooms once you're in
  • Also consider attending our bi-weekly Saturday developer syncs!
    • Details are shared in the General room on Matrix each Wednesday before the sync
    • It would be great to meet you 😊

Note

Scribe uses Conventional Comments in reviews to make sure that communication is as clear as possible.

@github-actions
Copy link
Copy Markdown
Contributor

Maintainer Checklist

The following is a checklist for maintainers to make sure this process goes as well as possible. Feel free to address the points below yourself in further commits if you realize that actions are needed :)

  • Tests for changes have been written and the pytest, linting and formatting workflows within the PR checks do not indicate new errors in the files changed

  • The CHANGELOG has been updated with a description of the changes for the upcoming release and the corresponding issue (if necessary)

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First PR Commit Check

  • The commit messages for the remote branch of a new contributor should be checked to make sure their email is set up correctly so that they receive credit for their contribution
    • The contributor's name and icon in remote commits should be the same as what appears in the PR
    • If there's a mismatch, the contributor needs to make sure that the email they use for GitHub matches what they have for git config user.email in their local Scribe-Data repo (can be set with git config --global user.email "GITHUB_EMAIL")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Emoji keyword data conversion from JSON to sqlite fails

1 participant