docs: onboarding compatibility analysis for blackboard migration#1942
docs: onboarding compatibility analysis for blackboard migration#1942solaris007 wants to merge 2 commits intomainfrom
Conversation
…igration Documents how LLMO and ASO onboarding commands interact with the legacy audit pipeline (jobs-dispatcher, audit-worker) and identifies compatibility gaps as individual audits migrate to the mysticat blackboard architecture. Key findings: - Per-audit migration model (not per-site) - Control API scan-now endpoint solves the timing gap for initial audits - Service-to-service auth needed on Control API before migration can start - Audit type to goal key mapping needed - Phased migration path with prerequisites
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
Adds an internal engineering note documenting how SpaceCat onboarding (LLMO/ASO) interacts with the ongoing migration of audits from the legacy spacecat-audit-worker pipeline to the Mysticat/Mystique “blackboard” control + facts architecture, with an emphasis on compatibility gaps and a phased migration approach.
Changes:
- Documented shared Aurora data model assumptions and blackboard-specific control/facts tables.
- Described the legacy dispatch pipeline and the “timing gap” for initial audits when an audit type migrates off the legacy worker.
- Outlined a phased migration plan (incl. need for Control API service-to-service auth and auditType→goal mapping).
| # Onboarding Compatibility with Mysticat Blackboard | ||
|
|
||
| Research date: 2026-03-11 |
There was a problem hiding this comment.
The doc uses both “Mysticat” (title/context) and “mystique” (e.g., “Blackboard (mystique-owned)”, and later “mystique (producer)”) without explaining whether these are the same system or different components. This is likely to confuse readers; please align the terminology (or add a short glossary/clarifying sentence describing how Mysticat vs Mystique relate to the blackboard/control service).
| ### What works today (no changes needed) | ||
|
|
||
| | Concern | Status | Why | | ||
| |---------|--------|-----| | ||
| | Org/Site creation | OK | Both stacks read from shared Aurora | | ||
| | Entitlement + SiteEnrollment | OK | Product code + tier used by both stacks | | ||
| | DRS (llmo-data-retrieval) | OK | Reads site config via SpaceCat API, no audit dependency | | ||
| | Site config fields | OK | Stored on SpaceCat site document | | ||
| | Brand profile agent | OK | Runs via Step Functions, independent | | ||
| | Legacy audit enablement | OK | Harmless for migrated types (worker 404s, DLQ absorbs) | | ||
| | Jobs-dispatcher scheduling | OK | Harmless for migrated types (same 404/DLQ behavior) | | ||
|
|
There was a problem hiding this comment.
The “What works / no changes needed” table content is repeated earlier in the doc ("## What Works (No Changes Needed)") and again here under “Compatibility Analysis”. Consider de-duplicating or explicitly calling out why the second table differs, to avoid the doc drifting out of sync over time.
| ### What works today (no changes needed) | |
| | Concern | Status | Why | | |
| |---------|--------|-----| | |
| | Org/Site creation | OK | Both stacks read from shared Aurora | | |
| | Entitlement + SiteEnrollment | OK | Product code + tier used by both stacks | | |
| | DRS (llmo-data-retrieval) | OK | Reads site config via SpaceCat API, no audit dependency | | |
| | Site config fields | OK | Stored on SpaceCat site document | | |
| | Brand profile agent | OK | Runs via Step Functions, independent | | |
| | Legacy audit enablement | OK | Harmless for migrated types (worker 404s, DLQ absorbs) | | |
| | Jobs-dispatcher scheduling | OK | Harmless for migrated types (same 404/DLQ behavior) | | |
| The items listed under **“What Works (No Changes Needed)”** above apply directly here; they | |
| represent areas where onboarding flows are already compatible with Mysticat Blackboard and | |
| require no additional changes. |
solaris007
left a comment
There was a problem hiding this comment.
Hey @solaris007,
Strengths
- Accurate shared data layer analysis - correctly identifies that both stacks share Aurora and
site.idis the universal FK, making per-audit migration viable (docs/onboarding-blackboard-compatibility.md:17-39) - Per-audit migration model is clearly articulated with the goal_configs cascade hierarchy (
docs/onboarding-blackboard-compatibility.md:42-60) - Legacy dispatch pipeline trace is verified against source code -
Configuration.findLatest(),isHandlerEnabledForSite(), and SQS dispatch patterns all match - Auth blocker correctly identified as critical path - prevents wasted implementation effort
- Pseudocode migration path makes the doc immediately actionable (
docs/onboarding-blackboard-compatibility.md:330-360)
Issues
Important (Should Fix)
1. ASO Step Functions workflow not documented
docs/onboarding-blackboard-compatibility.md:284-293
The ASO flow omits the Step Functions workflow launched after onboarding (src/support/utils.js:1471-1487). This runs opportunity-status-processor, disable-import-audit-processor, demo-url-processor, and cwv-demo-suggestions-processor. The opportunity-status-processor waits for audit completion and reads from LatestAudit - if migrated audits complete on a different timeline or write to facts instead, this processor silently times out. Unlike the "noisy but functional" DLQ path, failures inside a state machine are functionally invisible.
Fix: Add Step Functions as step 10 in the ASO flow. Note that these processors need compatibility assessment, especially opportunity-status-processor which depends on audit result timing.
2. Missing LatestAudit storage compatibility analysis
The legacy pipeline writes results to LatestAudit. Multiple consumers depend on this: Slack "run audit" command, opportunity/suggestion pipeline, site-info command, ASO post-processing workflows, and LLMO dashboards. If blackboard producers write to the facts table instead of (or in addition to) LatestAudit, that is a compatibility gap affecting more than just onboarding.
Fix: Add an "Audit result storage compatibility" section covering whether blackboard producers write to LatestAudit, facts, or both, and which consumers depend on each.
3. trigger:llmo-onboarding-publish SQS message not documented
src/controllers/llmo/llmo-onboarding.js:1011-1014
The LLMO flow sends enqueueLlmoOnboardingPublish() to the audits queue before audit triggers. Unlike regular audit types, this publishes LLMO UI data. If this handler migrates to the blackboard, silent failure means customers see a broken LLMO dashboard after onboarding - not just DLQ noise.
Fix: List trigger:llmo-onboarding-publish explicitly as a type to track during migration.
4. "Direct DB write" option insufficiently flagged as dangerous
docs/onboarding-blackboard-compatibility.md:193
Listed as a peer option with only "(tight coupling)" as downside. This bypasses all Control API validation, invariant enforcement, and creates a dual-writer concurrency risk.
Fix: Mark as "NOT RECOMMENDED" with explicit rationale, or remove.
5. Missing credential lifecycle discussion for s2s auth
docs/onboarding-blackboard-compatibility.md:189-194
API key or JWT is proposed but nothing about storage (Secrets Manager vs env vars), rotation cadence, scope (scan-only vs full admin), or ownership.
Fix: Add a subsection under prerequisites or open questions covering credential storage, scoping, and rotation.
6. BLACKBOARD_MAPPING in pseudocode is undefined
docs/onboarding-blackboard-compatibility.md:311-337
The pseudocode references BLACKBOARD_MAPPING[auditType] but the mapping is an open question. A developer copy-pasting this would silently route all types to legacy (empty mapping = all fallthrough).
Fix: Add a prominent note that the mapping must be resolved before this code is used. Consider populating with the partial known mappings as a starting point.
Minor (Nice to Have)
- Duplicate "What Works" table at lines 64-73 and 174-182 - remove the first instance
- Missing cross-reference to existing
docs/onboard-workflow.mdwhich documents ASO in detail llmo-customer-analysisis enabled but not immediately triggered (deferred until DRS completion) - this timing nuance matters for migration sequencing- Brand-profile agent is triggered by controller/Slack wrapper, not
performLlmoOnboardingitself - relevant for new entry points
Recommendations
- Add a "Consumers of LatestAudit" section - this is the biggest gap for a complete compatibility picture
- Add a rollback section - what happens if a blackboard audit regression is discovered for a migrated type? Can the legacy handler be re-enabled?
- Consider EventBridge as a fifth s2s option - an
onboarding.scan-requestedevent avoids HTTP auth entirely and aligns with existing infrastructure - Add a "when to migrate" decision checklist per audit type (producer registered, goal mapping published, LatestAudit backfill confirmed, onboarding updated)
- Consider mTLS/SigV4 for s2s auth instead of static API keys - both services run in the same AWS account/VPC
- Plan to clean up migrated types from the dispatcher proactively - a permanently noisy DLQ masks real failures
Assessment
Ready to merge? Yes, with follow-up
This is a research document, not an implementation spec. The analysis is substantively correct and fills a real knowledge gap. The identified gaps (LatestAudit storage, Step Functions workflow, credential lifecycle) should be addressed before this doc is used as a migration guide, but they don't block merging the research artifact. Recommend a follow-up pass to address the Important issues before migration planning begins.
- Add Mysticat/Mystique terminology clarification - Remove duplicate "What Works" table - Add ASO Step Functions workflow (4 processors) with compatibility notes - Add Audit Result Storage section: LatestAudit replaced by CQRS pattern (facts -> projector -> projection_opportunity/suggestion tables) - Document trigger:llmo-onboarding-publish in LLMO flow - Mark "Direct DB write" as NOT RECOMMENDED - Add credential lifecycle section (storage, scope, rotation, ownership) - Add caveat to BLACKBOARD_MAPPING pseudocode - Add cross-reference to onboard-workflow.md - Note llmo-customer-analysis deferred timing and brand-profile trigger location
|
This PR will trigger no release when merged. |
Summary
docs/onboarding-blackboard-compatibility.mdanalyzing how LLMO and ASO onboarding commands interact with the legacy audit pipeline and the mysticat blackboard architectureKey Findings
POST /sites/{site_id}/scanendpoint (withgoal_overrides) solves thisScope
Research/documentation only - no code changes.
Test plan