Skip to content

Releases: cortexkit/opencode-magic-context

v0.14.0

22 Apr 19:33

Choose a tag to compare

✨ Features

Context-overflow recovery (fixes #32)
When a provider returns a 400 "context too long" error, the plugin now extracts the real reported context limit from the error body, persists it per-session, and triggers the emergency historian path used at 95% pressure. The next turn proceeds with the corrected pressure math and a rebuilt compressed history instead of a second failed turn. Covers Moonshot, OpenRouter, Anthropic, OpenAI, and Cerebras error shapes — so models with wrong or missing models.dev limits (like lemonade/GLM-4.7-Flash-GGUF) self-correct on first overflow instead of failing indefinitely.

Dreamer graduation: user_memories and pin_key_files
Both options graduate from experimental.* to dreamer.* as first-class settings. Users with existing configs are migrated automatically both at plugin startup (in-memory, so the plugin works immediately without doctor) and on doctor (on-disk, preserving comments). Primitive forms like experimental.user_memories: true are coerced to the object shape {enabled: true} so no explicit opt-in or opt-out is silently lost. Users who never had these flags set see no behavioral change.

Compaction markers promoted to stable
experimental.compaction_markers moves to top-level compaction_markers (default true). Writing OpenCode-native compaction boundaries after historian publication has been running reliably for long enough to drop the experimental flag.

Interleaved-reasoning models (Kimi / Moonshot)
Models that declare interleaved.field in models.dev capabilities (Kimi K2.6, Moonshot variants) require reasoning_content on every assistant message with tool calls. The plugin now detects this capability and skips all three reasoning-cleanup paths — clearOldReasoning, stripClearedReasoning, and the Anthropic merged-assistants workaround — so reasoning parts survive to the wire and OpenCode's provider transform can emit reasoning_content correctly. Model switches mid-session reset the reasoning watermark automatically, so you can switch in and out of Kimi without stuck stripped state.

🧪 Experimental: Automatic context-search hints
On eligible user turns (prompt above configurable length, not a reply to a recent hint), the plugin runs ctx_search in the background and appends a compact <ctx-search-hint> block (up to 3 compressed fragments) to the current user message. Cache-safe — the hint is appended to the fresh user message only, never message[0]. Fully off by default.

{
  "experimental": {
    "auto_search": {
      "enabled": true,
      "min_prompt_chars": 20,
      "score_threshold": 0.55
    }
  }
}

🧪 Experimental: Git commit indexing
HEAD non-merge commits are periodically indexed into an embedded git_commits table and become a fourth ctx_search source alongside memories, session facts, and raw history. Useful for recalling why a specific file changed when natural-language memories don't cover it. Abandoned branch experiments never surface — only HEAD history.

{
  "experimental": {
    "git_commit_indexing": {
      "enabled": true,
      "since_days": 365,
      "max_commits": 2000
    }
  }
}

🐛 Fixes

Eliminate cache-bust loop from NULL session_meta rows
Long-running sessions could hit a loop where every transform pass re-ran heuristic cleanup, re-applied pending drops, and burned Anthropic prompt cache. Root cause: isSessionMetaRow() rejected rows with last_transform_error = NULL (valid schema state), which caused getOrCreateSessionMeta() to fall back to defaults. The defaults reset lastResponseTime to 0, which made the scheduler think TTL had expired on every pass. Rows with NULL text-default columns are now accepted and coerced to "", and a one-time startup healer fixes existing rows in place.

Dashboard embedding endpoint probe matches doctor (fixes #33)
The dashboard's "Test" button for OpenAI-compatible embeddings now performs the same classification as doctor: {env:VAR} / {file:path} substitution before probing, body inspection on 2xx responses (catches OpenRouter's /embeddings returning chat-completion bodies), auth-vs-other HTTP error distinction, and 404/405 → "wrong URL or no embeddings API" guidance. An unresolved_token outcome specific to the GUI launch path surfaces the common macOS case where EMBED_API_KEY is in the shell rc file but the dashboard was launched from Finder/Dock (different env inheritance from launchd).

Dream-timer indexes every registered project
The singleton dream timer was only iterating one project (the first Desktop directory loaded), leaving other registered projects unreachable by scheduled commit indexing or Dreamer. The timer now sweeps all registered projects each tick and runs an immediate startup tick instead of waiting for the first 15-minute interval.

Assistant-message reasoning preserved under OpenCode's message rebuilds
Opus 4.7 session-recovery fix (from @tomolom's earlier PR) now also preserves reasoning parts on non-first assistants in consecutive assistant runs for interleaved-reasoning models. Without this, Kimi's reasoning_content contract breaks on any message with a tool call that follows another assistant message.

🧹 Internals

  • New packages/e2e-tests/ workspace with a mock Anthropic server and opencode serve subprocess runner. Covers context-overflow recovery (tagged to issue #32), cache-stable defer passes, historian emergency invocation at 95%+, successful compartment publication, custom-provider context-limit resolution, plugin self-disable under conflicts, and per-session lifecycle isolation.
  • Runtime config migration: experimental graduations are handled at both schema-parse time (in-memory) and doctor (on-disk). Users who had explicit settings in the old location keep their intent across the upgrade.
  • Startup healer for NULL text columns in session_meta runs once per DB open. Existing long-lived sessions repair themselves without manual intervention.

Upgrade

bunx --bun @cortexkit/opencode-magic-context@latest doctor --force

Restart OpenCode afterward. The desktop dashboard should be upgraded separately — see dashboard-v0.3.1.

Dashboard dashboard-v0.3.1

22 Apr 19:51

Choose a tag to compare

🐛 Bug Fix

Embedding endpoint test matches doctor classification (fixes #33)

The Test button for OpenAI-compatible embeddings in the Memory configuration panel previously only checked the HTTP status code, which produced misleading results:

  • 2xx with wrong body shape — OpenRouter's /embeddings returns 200 with a chat-completion body. Dashboard falsely reported "✓ Connected".
  • Unresolved {env:VAR} / {file:path} tokens — the dashboard sent them literally. On macOS GUI launches the dashboard's environment doesn't inherit shell rc files, so EMBED_API_KEY set in ~/.zshrc isn't visible to a Finder/Dock-launched dashboard — users got opaque 401 errors instead of actionable guidance.
  • 404 / 405 vs. generic HTTP errors were collapsed into identical output, losing the "wrong URL / no embeddings API" guidance.

The probe is now a port of the plugin's doctor embedding probe (shipped in v0.13.0). It performs:

  • Full substitution of {env:VAR} and {file:path} tokens before sending the request
  • Body inspection on 2xx responses to verify the response actually contains data[0].embedding as a float array, reporting dimensions on success
  • Classification into specific outcomes: ok, auth_failed, endpoint_unsupported, http_error, network_error, timeout, invalid_scheme, unresolved_token
  • Specific guidance for each outcome — e.g. the unresolved_token case tells you to launch OpenCode from a terminal or run doctor, rather than showing a useless 401

Users running doctor and the dashboard now see the same text for every failure mode.

🧪 Tests

17 new Rust unit tests cover substitution (env resolution, file resolution, residual detection, whitespace trimming), dimension extraction (valid embeddings, chat-style bodies, empty arrays, non-numeric values), preview truncation, and probe scheme validation.

Upgrade

Auto-update is enabled — existing installations will prompt within 24 hours.

Manual: download from the release assets below, or let the app's built-in Check for Updates... (app menu on macOS, or via the tray icon) pull the new build.

Full Changelog: dashboard-v0.3.0...dashboard-v0.3.1

v0.13.2

21 Apr 20:56

Choose a tag to compare

🐛 Bug Fixes

Compressor — prevent depth cascade on old compartments
The compressor's selection algorithm previously scanned oldest→newest and picked the first contiguous same-depth run it found. This caused the same raw-ordinal range to be re-selected pass after pass, rapidly deepening a small prefix (e.g. seq 0-3) all the way to depth 5 (caveman — title-only collapse with empty content) within hours, while the rest of the history remained at depth 0.

The new algorithm iterates distinct depth tiers ascending and picks the oldest contiguous run within each tier. Recent content stays fresh, old content deepens gradually — producing the smooth memory-decay gradient the design originally intended, like a human brain forgetting the middle years first. After upgrading, if you want to rebuild previously over-compressed compartments, run /ctx-recomp <start>-<end> on the affected range.

Embedding sweep — drain each project fully instead of trickling
When you switch embedding providers (e.g. swap local MiniLM for LMStudio / OpenAI), the plugin wipes stale vectors and re-embeds. The old sweep ran one 10-item batch per 15-minute timer tick — on a 400-memory project that meant ~10 hours to fully re-embed.

The sweep now drains each project fully before moving to the next, ordered by MAX(updated_at) so your active project drains first. Bounded by a 10-minute wall-clock deadline and a 3-consecutive-empty fail-safe, and guarded against parallel sweeps via a singleton flag. After a provider swap, 400+ memories now re-embed in a single tick instead of trickling for hours.

Upgrade

bunx --bun @cortexkit/opencode-magic-context@latest doctor --force

Restart OpenCode afterward.

v0.13.1

21 Apr 18:24

Choose a tag to compare

🐛 Bug fix

curl | bash installer: historian model picker frozen

The one-line installer (curl -fsSL ... | bash) could freeze on the historian model selection screen — arrow keys did nothing, ctrl+c didn't exit — forcing users to kill the terminal and fall back to running bunx --bun @cortexkit/opencode-magic-context@latest setup directly.

Root cause: a regression in scripts/install.sh (v0.12.x) added --bun to the bunx invocation, which forced the setup CLI to run under Bun even when a modern Node was on PATH. Bun's TTY handling under a shell </dev/tty redirect doesn't deliver raw-mode keypress events reliably, which broke @clack/prompts' select() component that the historian/dreamer/sidekick pickers depend on.

Fix: install.sh now prefers bunx without --bun when bun and a compatible Node (≥ 20.12) are both present, letting the CLI's #!/usr/bin/env node shebang run the setup under Node — which handles </dev/tty correctly. Falls back to npx with the same Node check, then to bunx --bun with an on-screen hint pointing users at the working direct invocation if neither is available.

If you already have magic-context installed, nothing to do — this only affects users running the installer fresh.

Full Changelog: v0.13.0...v0.13.1

v0.13.0

21 Apr 11:35

Choose a tag to compare

✨ Highlights

Historian V2 — sharper compartments, better compression

The historian prompt has been rewritten from scratch. Compartments are now narrative-first with minimal U: line preservation, ban question-form quotes, and drop agreement-prefix noise. Internal testing on a real 22K-message session showed U: line counts dropping from 76 → 24 per run while preserving high-signal user directives.

Paired with the new two-pass mode that adds an optional editor pass for models without native thinking — cleans up U: lines in a second call, making historian work well on cheaper/non-reasoning models and local open-weight setups.

{
  "historian": {
    "two_pass": true  // optional — for non-thinking models
  }
}

Compressor redesign — caveman-style at every depth

The background compressor that merges older compartments when the history block overflows its budget has been rebuilt with depth-aware tier merging — each depth level applies progressively more aggressive caveman-style compression (literally: by depth 5, compartments collapse to terse caveman-prose titles). A deterministic post-process enforces style consistency regardless of which model did the merging.

Oldest same-depth compartment bands are chosen first, newest compartments are never touched (configurable via compressor.grace_compartments), and merge ratios scale by depth — protecting recent work from over-aggressive compression.

{
  "compressor": {
    "grace_compartments": 10,         // default: never compress newest 10
    "max_compartments_per_pass": 15,  // default: hard cap per pass
    "max_merge_depth": 5              // default: max compression tier
  }
}

Partial /ctx-recomp — rebuild a message range

Rebuild a specific slice of history instead of the whole session:

/ctx-recomp 1-11322

Snaps to enclosing compartment boundaries, rebuilds only those compartments with current historian rules, and leaves prior/tail compartments and all session facts untouched. Resumable across restarts. Useful after upgrading historian prompt versions or model quality.

🧪 Experimental: Temporal Awareness

Gives the agent real-time perception. Each user message gets a small gap marker showing time elapsed since the previous message:

<!-- +5m -->
<!-- +2h 15m -->
<!-- +3d 4h -->

And every compartment in <session-history> carries start-date and end-date attributes. Lets the agent reason correctly about "how long did that build run", "when did we decide X", or "how stale is that session". Cache-safe — markers derive from immutable message timestamps.

{
  "experimental": { "temporal_awareness": true }
}

Doctor: embedding endpoint probe

doctor now actually tests your openai-compatible embedding endpoint. Shows resolved URL, detects unresolved {env:VAR} substitutions, and posts a real /embeddings request to distinguish auth errors from network issues from unsupported models — ending silent "embeddings just don't work" debugging.

Config: {env:VAR} and {file:path} substitution

Variables are now expanded in magic-context.jsonc before parsing — matches OpenCode's own config variable style:

{
  "embedding": {
    "provider": "openai-compatible",
    "endpoint": "{env:MC_EMBEDDING_ENDPOINT}",
    "api_key": "{file:~/.secrets/embeddings.key}"
  }
}

🐛 Fixes

/ctx-status shows the correct live-model threshold after restart. Model lookup now falls back to the latest assistant providerID/modelID stored in OpenCode's DB when the in-memory live-model map is empty on fresh transforms.

Compressor cannot race against historian writes. Background compressor now registers its promise in the activeRuns map so historian cannot start on top of an in-flight replaceAllCompartmentState call.

Full /ctx-recomp resets compression depth. Rebuilt compartments now start at depth 0 — matching what partial recomp already did. Prevents the compressor from carrying ghost depth values from pre-recomp state.

Secrets stop leaking through config validation warnings. Values substituted from {env:VAR} no longer appear in Zod error messages when a field fails validation. Strings render as "string, N chars", objects as key lists, numbers and booleans unchanged.

Compressor band selection finds all bands. Fixed a singleton-run skip that could hop past the boundary element, missing valid same-depth bands immediately after it.

Historian heals tool-only gaps of any size. When a chunk gap lies fully inside tool-only message runs, validation now passes instead of failing with gap before message N. Non-tool gaps still require contiguous coverage.

Partial recomp no longer hits UNIQUE constraint collisions. Sequence assignment uses MAX(sequence) + 1 instead of row-count.

Silenced repetitive models-dev-cache: API layer loaded 297 model limits log. Logs now fire only on first load and when the count actually changes.

🔧 Internal

  • Dashboard Biome config added; pre-existing lint errors cleared (24 files, reviewable baseline)
  • Documentation for every experimental feature consolidated in CONFIGURATION.md

Full Changelog: v0.12.0...v0.13.0

Dashboard dashboard-v0.3.0

21 Apr 11:53

Choose a tag to compare

✨ Highlights

Bulk memory management (closes #28)

The Memory Browser now handles batch operations properly.

  • Per-item checkboxes on every memory
  • Per-category tri-state checkboxes (all / some / none) to select a whole category at once
  • Select-all-visible sticky control at the top
  • Bulk archive — reversible, archives a batch in one SQL transaction
  • Bulk delete — destructive, requires a native confirm dialog
  • Collapsible category groups with right-side chevrons, so Architecture Decisions / Naming / Constraints / etc. can be folded for easier navigation
  • Persisted collapse state in localStorage — your layout sticks across restarts

Bulk operations run as atomic SQL transactions on the Rust backend (bulk_update_memory_status, bulk_delete_memory), not as frontend loops — partial failures leave nothing half-applied.

🔧 Internal

  • Biome lint coverage now mandatory for the dashboard codebase
  • Pre-existing lint violations cleared across 24 files as the reviewable baseline — future a11y regressions are now caught by CI

📦 Pairs with plugin v0.13.0

This dashboard release is tested against plugin v0.13.0. Upgrade both for the full experience:

bunx --bun @cortexkit/opencode-magic-context@latest doctor --force

Full Changelog: dashboard-v0.2.8...dashboard-v0.3.0

v0.12.0

19 Apr 10:34

Choose a tag to compare

🚀 New Feature

Set execute thresholds in absolute tokens, not just percentages. The new execute_threshold_tokens config lets you trigger magic context's execute pass at a specific token count instead of a percentage of the model's context. This is especially useful when you want a hard cap matching your provider's limit.input — e.g. stop at exactly 100K tokens regardless of the model's total context window.

{
  // Trigger at absolute token counts (overrides percentage for matched models).
  // Falls back to execute_threshold_percentage for models not listed here.
  "execute_threshold_tokens": {
    "default": 150000,
    "github-copilot/gpt-5.3-codex": 100000,
    "openai/gpt-5.4-fast": 250000
  }
}

How it works:

  • When a model matches execute_threshold_tokens (exact, base-model, or bare key), it wins over execute_threshold_percentage for that model.
  • Values above 80% × context_limit are clamped with a warning (same cache-safety cap as percentage mode).
  • Progressive base-model lookup: openai/gpt-5.4-fast will match openai/gpt-5.4 if only the base key is set.
  • /ctx-status shows which mode is active: 100,000 tokens (10.0% of 1,000,000) [token-mode].
  • Dashboard gets a new per-model input under Thresholds.

Percentage config continues to work exactly as before for models not listed in tokens config.

🐛 Fixes

/ctx-status mode display drift. Previously, /ctx-status reimplemented token-match detection locally and missed OpenCode's progressive base-model lookup. A session using gpt-5.4-fast with execute_threshold_tokens.openai/gpt-5.4 configured was resolved correctly by the scheduler but mislabeled as "percentage mode" in the status dialog. /ctx-status now consumes the resolver's authoritative mode output.

Clamp warning spam. An over-cap execute_threshold_tokens value would log a warning on every transform pass until the user fixed the config. Warnings now dedupe by (session, model, value, cap) tuple — one warning per unique clamp, not one per pass.

Dashboard config drift (caught during v0.12.0 audit):

  • drop_tool_structure toggle showed the wrong default (was false, correct is true).
  • execute_threshold_percentage slider min was stale at 35 — now correctly 20.
  • memory.retrieval_count_promotion_threshold existed in schema but had no UI field — now exposed.

🛡️ Hardening

Runtime guards against malformed inputs. The threshold resolver now explicitly blocks NaN, zero, and negative values for contextLimit and token values from poisoning math. Schema normally prevents these, but runtime derivations like inputTokens / (percentage/100) can produce NaN when percentage is 0. Bad inputs now fall through to percentage config safely instead of silently returning garbage.

🔧 Internal

  • New resolveExecuteThresholdDetail() in event-resolvers.ts is the single source of truth for execute-threshold resolution. Returns { percentage, mode, absoluteTokens?, matchedKey? }. The old resolveExecuteThreshold() is now a thin backward-compat wrapper.
  • RPC StatusDetail now exposes executeThresholdMode: "percentage" | "tokens" and optional executeThresholdTokens (absolute clamped value) for TUI and dashboard consumers.
  • JSON Schema (assets/magic-context.schema.json) regenerated with previously missing fields: drop_tool_structure, compaction_markers, experimental.*.
  • Dead code removed: resolveExecuteThresholdTokens() helper and unused private resolveTokensMatch().

🧪 Tests

  • +10 tests covering token-mode detail resolution, progressive base-model matching for tokens config, clamp behavior, NaN/zero/negative runtime guards, and dedupe stability across repeated calls.

571 tests passing (up from 561), typecheck clean, build clean.


Full changelog: v0.11.1...v0.12.0

v0.11.1

19 Apr 06:57

Choose a tag to compare

🐛 Fixes

Per-model execute_threshold_percentage works immediately across restarts. The scheduler's per-model threshold lookup relies on liveModelBySession, which only got populated when a message.updated event fired. After a plugin restart (or on the first transform of a pre-existing session), the map stayed empty and every transform in that window silently fell back to config.default. The transform now populates the map from the last assistant message in history on first access, so per-model thresholds apply immediately.

Per-model config matches both derived and base model keys. OpenCode's experimental.modes generates derived model IDs like gpt-5.4-fast from base gpt-5.4. Users may now write either form in their execute_threshold_percentage config — both resolve correctly for a gpt-5.4-fast session, with most-specific match winning. Also matches bare model IDs (without provider prefix) through the same cascade.

// Either of these now works for a gpt-5.4-fast session:
{ "default": 65, "openai/gpt-5.4-fast": 25 }   // exact derived key
{ "default": 65, "openai/gpt-5.4": 25 }         // base key — matches all modes

Correct context-limit denominator for GitHub Copilot and proxy providers. models.dev distinguishes three fields: limit.context (total input+output window), limit.input (max prompt the provider will accept), and limit.output. For 182+ models — including every GitHub Copilot entry and most gpt-5.x derivations through proxies — limit.input < limit.context. Previously magic-context used the larger limit.context number as the pressure denominator, so thresholds fired later than they should and prompts >limit.input could still be sent and get rejected by the provider even though we thought we were "still under budget". All three resolution paths (models.json file layer, opencode.json custom provider overlay, SDK API refresh) now prefer limit.input over limit.context, matching OpenCode's own session/overflow.ts behavior.

Concrete example — github-copilot/gpt-5.3-codex:

  • Before: plugin saw 400K limit, threshold 25% fired at 100K tokens
  • After: plugin sees 272K limit, threshold 25% fires at 68K tokens — the real capacity

No action needed: the denominator is computed fresh each transform, no DB migration.

🔧 Internal

  • New resolveLimit() helper in models-dev-cache.ts centralizes the input ?? context precedence.
  • New modelKeyLookupOrder() generator in event-resolvers.ts yields lookup keys in specificity order (exact → base-stripped → bare).
  • Test isolation fix: models-dev-cache tests now set OPENCODE_CONFIG_DIR to an empty directory so user's real opencode.jsonc (which may have custom provider limit overrides) does not leak into test expectations.

🧪 Tests

  • +10 tests covering resolveExecuteThreshold progressive lookup (exact, base, bare, bare-base, most-specific-wins, fallback, undefined-modelKey).
  • +4 tests covering models-dev-cache input/context preference (file layer, derived modes inheritance, custom opencode.json overlay, API refresh path).

554 tests passing (up from 540), typecheck clean, build clean.


Full changelog: v0.11.0...v0.11.1

Dashboard dashboard-v0.2.8

19 Apr 10:44

Choose a tag to compare

✨ Tracks Plugin v0.12.0

This release aligns the dashboard config editor with the new config surfaces and schema changes introduced in plugin v0.12.0.

🚀 New

Absolute-token execute thresholds in the Thresholds section. A new per-model field lets you configure execute_threshold_tokens from the dashboard UI — set a default and optional per-model overrides, all numeric inputs. Mirrors the plugin's new token-based threshold mode.

Execute Threshold (Tokens)
  Default:           [ 150000 ]
  + Add model override

🐛 Fixes

  • drop_tool_structure toggle showed the wrong default. Was rendered as "disabled" by default; actual plugin default is true (enabled). Now matches the runtime.
  • execute_threshold_percentage slider min corrected. Was stale at 35; plugin schema has allowed 20-80 since v0.8.5. Slider now goes down to 20.
  • memory.retrieval_count_promotion_threshold now editable. The field existed in the plugin schema but was not exposed in the dashboard UI. Now appears under Memory.

🔧 Internal

  • PerModelField gains two new props:
    • alwaysObject — for fields whose schema forbids bare-scalar form (like execute_threshold_tokens, which must be a map even when only default is set).
    • numericText — coerces text input values to numbers on save, so tokens configs serialize as valid integers instead of strings.

Note: Auto-update should pick this up automatically on next app launch. If you prefer manual install, use the download link in the README or https://cortexkit.github.io/opencode-magic-context/latest.json.

Full changelog: dashboard-v0.2.7...dashboard-v0.2.8

v0.11.0

18 Apr 19:38

Choose a tag to compare

✨ Highlights

Accurate token counts in the sidebar. The TUI sidebar's "Conversation", "System", "Memories", "Compartments", "Tool Calls", and residual "Tool Defs + Overhead" numbers now match what Anthropic actually sees on the wire — validated within 0.1% against a live 340K-input request capture. Sessions will self-heal on the next transform pass without any user action.

New "Tool Calls" sidebar slice. Tool-invocation tokens (tool_use, tool_result) are now tracked and displayed separately from the fixed tool-schema overhead. This tells you at a glance how much context is reducible via ctx_reduce vs how much is structural.

🐛 Fixes

Tokenizer fallback was silently active. The ai-tokenizer integration shipped in an earlier release used an eval("require") pattern that silently threw inside Bun's ESM runtime on every call — so every token count in the sidebar and status dialog was running through a chars/3.5 heuristic. On long sessions this inflated the "Tool Defs + Overhead" residual from ~22K (real) to ~90K+ (fake) and misattributed ~70K tokens across segments. Static ESM import now makes the real Claude tokenizer actually load. Sessions will self-heal on next transform pass (watermark-based re-count when drift >50 tokens).

Opus 4.7 thinking-block integrity. The stripReasoningFromMergedAssistants workaround for consecutive-assistant merges now handles wire-format thinking and redacted_thinking types in addition to OpenCode's internal reasoning. Previously, two consecutive assistants each carrying a thinking block could slip through unchanged and trigger Anthropic's "thinking blocks ... cannot be modified" 400 error — exactly the failure mode this function exists to prevent.

Cache-bust cascade on upgraded DBs. session_meta validator now tolerates NULL for INTEGER columns added later via ensureColumn (system_prompt_tokens, conversation_tokens, tool_call_tokens, times_execute_threshold_reached, compartment_in_progress, cleared_reasoning_through_tag). Before this fix, very old DBs could fail the validator, fall back to defaults, reset lastResponseTime=0, and cause the scheduler to return "execute" on every pass — endless cache busts. Added a companion healNullIntegerColumns heal to normalize any such rows on startup.

Persistent transform errors are now visible. The top-level transform error handler previously swallowed every error as if it were transient, which meant persistent schema or programming bugs silently disabled magic-context for the entire session with no user-facing signal. It now distinguishes SQLITE_BUSY / SQLITE_LOCKED (log + skip, normal behavior) from persistent non-transient errors (log with full detail + persist a summary into session_meta.last_transform_error, which the sidebar already surfaces).

Token cache invalidation coverage. The per-message token cache is now invalidated on message.updated (per-message, with session-wide fallback when the event lacks a message id) and on session.compacted (session-wide, since native compaction restructures messages). Previously only message.removed and session.deleted were covered, so retries or compaction could leave stale counts in memory.

Memory injection budget consistency. Memory trim-to-budget switched from chars/4 heuristic to the real estimateTokens() call, matching the rest of the plugin's token math. Prevents under-packing code/JSON-heavy memories or over-packing prose-heavy ones.

Robustness micro-fixes.

  • readUint32BE now coerces to unsigned with >>> 0 so malformed PNG headers with MSB-set bytes don't bypass the < 1 fallback and produce wrong image-token counts.
  • Removed dead || 0 in the WebP lossy parser (the & 0x3fff mask already produces a non-negative result).
  • Write-if-changed guard on lastTransformError prevents WAL write amplification during persistent-error states.

🔧 Internal

  • New persisted columns: conversation_tokens, tool_call_tokens in session_meta. Added via ensureColumn — no migration required.
  • MessageUpdatedAssistantInfo gains optional messageID from info.id so per-message cache invalidation can target precisely.
  • Test schemas updated across command-handler, note-nudger, heuristic-cleanup, ctx-reduce, storage-tags, and storage tests to include the two new token columns.
  • New focused test file clear-message-tokens-cache.test.ts covers per-message and session-wide cache invalidation paths plus cross-session isolation (5 tests).
  • stripReasoningFromMergedAssistants test coverage expanded with 3 cases exercising wire-format thinking / redacted_thinking sequences.
  • All plugin token math unified on estimateTokens() from ai-tokenizer — only remaining chars/3.5 estimate is intentional (pre-filter bucket for dreamer key-file selection where file content isn't loaded).

🔬 Process

This release was validated with two Athena solo-mode council audits:

  1. Post-implementation audit surfaced 12 findings. 9 were real bugs and fixed (including the thinking/redacted_thinking merge-strip gap and the NULL INTEGER column cache-bust cascade). 3 were deliberately skipped as bounded display edge cases.
  2. Verification audit of the fixes confirmed ship-readiness with strong agreement across 7 members. Three optional low-priority suggestions were applied on top.

540 tests passing (up from 532 at release start), typecheck clean, build clean.


Full changelog: v0.10.1...v0.11.0