Skip to content

refactor(store): extract _item_to_finding, fix TikTok data loss#74

Open
phjlljp wants to merge 1 commit intomvanhorn:mainfrom
phjlljp:refactor/schema-store-cleanup
Open

refactor(store): extract _item_to_finding, fix TikTok data loss#74
phjlljp wants to merge 1 commit intomvanhorn:mainfrom
phjlljp:refactor/schema-store-cleanup

Conversation

@phjlljp
Copy link
Copy Markdown
Contributor

@phjlljp phjlljp commented Mar 20, 2026

Summary

Bug fix: TikTok and TruthSocial findings silently lost in --store mode

When a user runs --store to persist research findings to SQLite, the code at last30days.py:1941-2021 iterates over each source's deduped items and builds a findings list for store_mod.store_findings(). There are 8 separate for-loops, one per source — but TikTok and TruthSocial were never included. Their deduped items (deduped_tiktok, deduped_ts) simply aren't iterated over, so they're silently dropped.

This means any user relying on --store for watchlist/briefing workflows (the variants/ use case) has been accumulating an incomplete dataset — TikTok and TruthSocial results appear in the CLI output but never make it to the database.

Root cause: The 8 for-loops were likely copy-pasted as new sources were added, and TikTok/TruthSocial were missed when they were introduced. The repetitive structure made it easy to overlook.

Refactor: Extract _item_to_finding() dispatch function

To prevent this class of bug from recurring, this PR replaces the 8 for-loops with:

  1. A single _item_to_finding(item) function that dispatches on isinstance() to map any schema item type to the flat dict format expected by the store module
  2. A single loop: findings = [_item_to_finding(item) for item in all_deduped] where all_deduped explicitly concatenates all 10 source lists

Adding a new source now requires adding one elif branch to _item_to_finding() and one list() entry to all_deduped — rather than copy-pasting an entire for-loop block. The fallback else branch also handles unknown item types gracefully.

Cleanup: Engagement.to_dict() simplified

Replaces 11 manual if self.field is not None checks with {k: v for k, v in asdict(self).items() if v is not None}. This means new engagement fields (like shares, volume, liquidity which were added after the original code) are automatically included in serialization without requiring a corresponding manual check. The asdict() call is safe here because Engagement is a flat dataclass with only scalar fields.

Test plan

  • Existing test suite passes (461/465, 4 pre-existing model-name failures unrelated to this PR)
  • All 10 source types (including TikTok and TruthSocial) produce correct findings via _item_to_finding()
  • Engagement.to_dict() output is identical before/after for all field combinations (all-None, single field, mixed sources, zero values, all fields populated)
  • The else fallback in _item_to_finding() handles unexpected item types without crashing

🤖 Generated with Claude Code

Replace 8 nearly-identical for-loops in the --store persistence path
with a single _item_to_finding() dispatch function. This also:

- Fixes a bug where TikTok findings were silently dropped from SQLite
  storage (the original loops never included deduped_tiktok)
- Adds TruthSocial to the persist path (also previously missing)
- Simplifies Engagement.to_dict() using dataclasses.asdict() so new
  fields are automatically included without manual None-checks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
chidev added a commit to chidev/last30days-skill that referenced this pull request Mar 25, 2026
@mvanhorn
Copy link
Copy Markdown
Owner

See my reply on #76 covering all three of your PRs. Can't commit to anything right now with the v3.0 refactor underway, but I'll consider each one once it lands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants