Skip to content

perf(catalog): trim highlight.js bundle from ~190 to 35 grammars#4883

Draft
fiskus wants to merge 3 commits into
masterfrom
worktree-hljs-bundle
Draft

perf(catalog): trim highlight.js bundle from ~190 to 35 grammars#4883
fiskus wants to merge 3 commits into
masterfrom
worktree-hljs-bundle

Conversation

@fiskus
Copy link
Copy Markdown
Member

@fiskus fiskus commented May 8, 2026

Summary

  • Replace the convenience highlight.js default import with a shared app/utils/hljs.ts that imports highlight.js/lib/core and registers a bounded set of 35 languages explicitly. Runtime API (hljs.highlight / hljs.getLanguage) is unchanged at every call site, so 7 files only need the import path swapped.
  • The registered set equals the keys of Text.js's LANGS map — the only enumerated runtime consumer of hljs in the catalog. python / bash used by Code.tsx callers are already in that set.
  • Narrow Code.tsx's hl: string prop to hl: RegisteredLanguage so passing an unregistered language is a compile error.

Behavioral change

Drop the hljs.highlightAuto fallback in Markdown.tsx. With ~35 registered grammars instead of ~190, auto-detection accuracy degrades sharply, so unlabeled / unsupported fences now render as plain monospace via Remarkable's default escaping — same path as the existing lang === 'none' short-circuit. A code comment at the deletion site explains the trade-off, and there's a CHANGELOG entry.

What's NOT in this PR

The 35 grammars still ship in one chunk — they're statically imported, not lazily loaded. The chunk topology is the same as before; the chunk is just smaller (~250 KB vs the prior full set). A TODO at the top of app/utils/hljs.ts documents the path to per-grammar lazy loading and the blocker (Remarkable's highlight callback and the preview loaders all call hljs.highlight synchronously).

Tests

app/utils/hljs.spec.ts adds two assertions:

  • Scope tripwireREGISTERED_LANGUAGES must equal an explicit expected list, so any change to the supported set has to surface in code review.
  • Text-loader contract — every key of Text.js's LANGS resolves via hljs.getLanguage(...) from the configured module, catching the case where someone adds an extension to LANGS without registering its language.

Test plan

  • npm run lint
  • npm run test:only — all 99 files / 893 tests pass (incl. the 2 new hljs assertions)
  • npm run build — succeeds; verified kotlin / haxe / unregistered grammars no longer appear in build output, while python / bash / json / yaml / typescript are present in the hljs language chunk
  • Manual smoke test:
    • Markdown README with fenced json / yaml / python / bash
    • Markdown README with an unsupported fence (e.g. kotlin) — should render as plain monospace
    • Preview a .json / .yaml / .py file
    • Bucket Download code-snippet panel

Greptile Summary

Replaces the full highlight.js default bundle (~190 grammars) with a curated utils/hljs.ts singleton that registers exactly 35 languages, reducing the hljs chunk size significantly. Seven files are updated only to swap the import path; Markdown.tsx additionally drops the highlightAuto fallback (intentional and well-documented), and Code.tsx's hl prop is narrowed from string to the new RegisteredLanguage union type.

  • A "scope tripwire" test locks down the registered-language list and a contract test verifies all Text.js LANGS keys resolve via hljs.getLanguage — but there is no assertion that each entry in REGISTERED_LANGUAGES itself is actually registered.
  • The lang === 'none' special case removal in Markdown.tsx is safe: 'none' is not registered, so getLanguage('none') already returned undefined and fell through to return '' in the old else-branch.

Confidence Score: 4/5

Safe to merge; the behavioral change (dropping highlightAuto) is intentional and well-documented, and all existing call sites use languages that are in the registered set.

The implementation is clean and test coverage is good for the primary consumer path (Text.js LANGS). The one gap is that the spec does not assert every entry in REGISTERED_LANGUAGES resolves via hljs.getLanguage, so the exported TypeScript type and the actual runtime registrations can drift silently in future PRs.

catalog/app/utils/hljs.spec.ts — missing assertion that every REGISTERED_LANGUAGES entry is actually registered in the hljs instance.

Important Files Changed

Filename Overview
catalog/app/utils/hljs.ts New shared module: imports highlight.js core and registers exactly 35 languages. REGISTERED_LANGUAGES array (source of the RegisteredLanguage type) and registerLanguage calls are maintained in parallel without a guard verifying they stay in sync.
catalog/app/utils/hljs.spec.ts Adds two tests: a snapshot tripwire for REGISTERED_LANGUAGES and a LANGS-key resolution check. Does not verify that every REGISTERED_LANGUAGES entry resolves via hljs.getLanguage, leaving a gap where the array and actual registrations can drift.
catalog/app/components/Markdown/Markdown.tsx Swaps highlight.js default import for utils/hljs; drops highlightAuto fallback and the lang==='none' short-circuit, both intentionally. Behavior for unknown/empty lang is unchanged.
catalog/app/containers/Bucket/Download/Code.tsx hl prop narrowed from string to RegisteredLanguage; all existing callers pass 'python' or 'bash', both in the registered set.
catalog/app/components/Preview/loaders/Text.js LANGS map exported so hljs.spec.ts can use it as the source of truth for the contract test; no other logic changes.

Reviews (1): Last reviewed commit: "Merge remote-tracking branch 'origin/mas..." | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

fiskus added 2 commits May 8, 2026 18:01
Replace the convenience `highlight.js` default import with a single shared
module (`utils/hljs`) that uses `highlight.js/lib/core` plus explicit
per-language registrations. Runtime API (`hljs.highlight`, `hljs.getLanguage`)
is unchanged at every call site.

The registered set is bounded by the keys of `Text.js`'s `LANGS` map (the
only enumerated runtime consumer); `python` / `bash` used by `Code.tsx`
callers are already in that set.

Behavioral change: drop `hljs.highlightAuto` fallback in `Markdown.tsx`.
With ~35 registered grammars instead of ~190, auto-detection accuracy
degrades sharply, so unlabeled / unsupported fences now render as plain
monospace via Remarkable's default escaping — same path as the existing
`lang === 'none'` short-circuit.

`Code.tsx`'s `hl` prop is narrowed from `string` to `RegisteredLanguage`,
so passing an unregistered language is a compile error.

Tests: `app/utils/hljs.spec.ts` adds a scope tripwire (registered set
matches an explicit expected list) and a contract test (every key of
`Text.js`'s `LANGS` map resolves via `hljs.getLanguage`).

A TODO at the top of `utils/hljs.ts` notes that the 35 grammars still
ship in one chunk; per-grammar lazy loading is left as a follow-up
because every call site invokes `hljs.highlight` synchronously.
@fiskus fiskus marked this pull request as ready for review May 8, 2026 16:09
@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

Codecov Report

❌ Patch coverage is 94.87179% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 46.64%. Comparing base (91b30b4) to head (62b5e08).

Files with missing lines Patch % Lines
catalog/app/components/Markdown/Markdown.tsx 0.00% 1 Missing ⚠️
catalog/app/containers/Bucket/Download/Code.tsx 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4883      +/-   ##
==========================================
+ Coverage   46.55%   46.64%   +0.09%     
==========================================
  Files         832      833       +1     
  Lines       34131    34162      +31     
  Branches     5833     5831       -2     
==========================================
+ Hits        15890    15936      +46     
+ Misses      16239    16227      -12     
+ Partials     2002     1999       -3     
Flag Coverage Δ
api-python 93.14% <ø> (ø)
catalog 21.74% <94.87%> (+0.17%) ⬆️
lambda 96.63% <ø> (ø)
py-shared 98.18% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread catalog/app/utils/hljs.spec.ts
…tered

Catches the failure mode where a developer extends the array (and thus the
`RegisteredLanguage` type) but forgets the matching `registerLanguage` call —
the type would claim support while `hljs.getLanguage` returns `undefined` at
runtime, silently degrading to plain text in Code.tsx. Suggested by Greptile
on #4883.
@fiskus
Copy link
Copy Markdown
Member Author

fiskus commented May 11, 2026

Wait for #4884 and then maybe implement async parsing.

@fiskus fiskus marked this pull request as draft May 11, 2026 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant