Skip to content

docs: onboarding compatibility analysis for blackboard migration#1942

Open
solaris007 wants to merge 2 commits intomainfrom
docs/onboarding-blackboard-compatibility
Open

docs: onboarding compatibility analysis for blackboard migration#1942
solaris007 wants to merge 2 commits intomainfrom
docs/onboarding-blackboard-compatibility

Conversation

@solaris007
Copy link
Member

Summary

  • Adds docs/onboarding-blackboard-compatibility.md analyzing how LLMO and ASO onboarding commands interact with the legacy audit pipeline and the mysticat blackboard architecture
  • Documents the per-audit migration model, compatibility gaps, and a phased migration path

Key Findings

  • Per-audit migration: Individual audits migrate to blackboard producers independently. A site can have some audits on legacy and others on blackboard simultaneously
  • Timing gap: Onboarding expects immediate audit execution via SQS. Blackboard runs on its own schedule. The Control API's POST /sites/{site_id}/scan endpoint (with goal_overrides) solves this
  • Auth blocker: Control API only supports Okta OIDC - service-to-service auth needed before onboarding can trigger blackboard scans programmatically
  • Legacy pipeline resilience: Migrated audit types get 404'd by the audit worker (noisy but not broken). Jobs-dispatcher and Slack commands should be updated to handle migrated types gracefully

Scope

Research/documentation only - no code changes.

Test plan

  • N/A - docs only

…igration

Documents how LLMO and ASO onboarding commands interact with the legacy
audit pipeline (jobs-dispatcher, audit-worker) and identifies compatibility
gaps as individual audits migrate to the mysticat blackboard architecture.

Key findings:
- Per-audit migration model (not per-site)
- Control API scan-now endpoint solves the timing gap for initial audits
- Service-to-service auth needed on Control API before migration can start
- Audit type to goal key mapping needed
- Phased migration path with prerequisites
Copilot AI review requested due to automatic review settings March 11, 2026 10:52
@solaris007 solaris007 self-assigned this Mar 11, 2026
@solaris007 solaris007 added the documentation Improvements or additions to documentation label Mar 11, 2026
@codecov
Copy link

codecov bot commented Mar 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an internal engineering note documenting how SpaceCat onboarding (LLMO/ASO) interacts with the ongoing migration of audits from the legacy spacecat-audit-worker pipeline to the Mysticat/Mystique “blackboard” control + facts architecture, with an emphasis on compatibility gaps and a phased migration approach.

Changes:

  • Documented shared Aurora data model assumptions and blackboard-specific control/facts tables.
  • Described the legacy dispatch pipeline and the “timing gap” for initial audits when an audit type migrates off the legacy worker.
  • Outlined a phased migration plan (incl. need for Control API service-to-service auth and auditType→goal mapping).

Comment on lines +1 to +3
# Onboarding Compatibility with Mysticat Blackboard

Research date: 2026-03-11
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc uses both “Mysticat” (title/context) and “mystique” (e.g., “Blackboard (mystique-owned)”, and later “mystique (producer)”) without explaining whether these are the same system or different components. This is likely to confuse readers; please align the terminology (or add a short glossary/clarifying sentence describing how Mysticat vs Mystique relate to the blackboard/control service).

Copilot uses AI. Check for mistakes.
Comment on lines +200 to +211
### What works today (no changes needed)

| Concern | Status | Why |
|---------|--------|-----|
| Org/Site creation | OK | Both stacks read from shared Aurora |
| Entitlement + SiteEnrollment | OK | Product code + tier used by both stacks |
| DRS (llmo-data-retrieval) | OK | Reads site config via SpaceCat API, no audit dependency |
| Site config fields | OK | Stored on SpaceCat site document |
| Brand profile agent | OK | Runs via Step Functions, independent |
| Legacy audit enablement | OK | Harmless for migrated types (worker 404s, DLQ absorbs) |
| Jobs-dispatcher scheduling | OK | Harmless for migrated types (same 404/DLQ behavior) |

Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “What works / no changes needed” table content is repeated earlier in the doc ("## What Works (No Changes Needed)") and again here under “Compatibility Analysis”. Consider de-duplicating or explicitly calling out why the second table differs, to avoid the doc drifting out of sync over time.

Suggested change
### What works today (no changes needed)
| Concern | Status | Why |
|---------|--------|-----|
| Org/Site creation | OK | Both stacks read from shared Aurora |
| Entitlement + SiteEnrollment | OK | Product code + tier used by both stacks |
| DRS (llmo-data-retrieval) | OK | Reads site config via SpaceCat API, no audit dependency |
| Site config fields | OK | Stored on SpaceCat site document |
| Brand profile agent | OK | Runs via Step Functions, independent |
| Legacy audit enablement | OK | Harmless for migrated types (worker 404s, DLQ absorbs) |
| Jobs-dispatcher scheduling | OK | Harmless for migrated types (same 404/DLQ behavior) |
The items listed under **“What Works (No Changes Needed)”** above apply directly here; they
represent areas where onboarding flows are already compatible with Mysticat Blackboard and
require no additional changes.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

@solaris007 solaris007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @solaris007,

Strengths

  • Accurate shared data layer analysis - correctly identifies that both stacks share Aurora and site.id is the universal FK, making per-audit migration viable (docs/onboarding-blackboard-compatibility.md:17-39)
  • Per-audit migration model is clearly articulated with the goal_configs cascade hierarchy (docs/onboarding-blackboard-compatibility.md:42-60)
  • Legacy dispatch pipeline trace is verified against source code - Configuration.findLatest(), isHandlerEnabledForSite(), and SQS dispatch patterns all match
  • Auth blocker correctly identified as critical path - prevents wasted implementation effort
  • Pseudocode migration path makes the doc immediately actionable (docs/onboarding-blackboard-compatibility.md:330-360)

Issues

Important (Should Fix)

1. ASO Step Functions workflow not documented
docs/onboarding-blackboard-compatibility.md:284-293

The ASO flow omits the Step Functions workflow launched after onboarding (src/support/utils.js:1471-1487). This runs opportunity-status-processor, disable-import-audit-processor, demo-url-processor, and cwv-demo-suggestions-processor. The opportunity-status-processor waits for audit completion and reads from LatestAudit - if migrated audits complete on a different timeline or write to facts instead, this processor silently times out. Unlike the "noisy but functional" DLQ path, failures inside a state machine are functionally invisible.

Fix: Add Step Functions as step 10 in the ASO flow. Note that these processors need compatibility assessment, especially opportunity-status-processor which depends on audit result timing.

2. Missing LatestAudit storage compatibility analysis

The legacy pipeline writes results to LatestAudit. Multiple consumers depend on this: Slack "run audit" command, opportunity/suggestion pipeline, site-info command, ASO post-processing workflows, and LLMO dashboards. If blackboard producers write to the facts table instead of (or in addition to) LatestAudit, that is a compatibility gap affecting more than just onboarding.

Fix: Add an "Audit result storage compatibility" section covering whether blackboard producers write to LatestAudit, facts, or both, and which consumers depend on each.

3. trigger:llmo-onboarding-publish SQS message not documented
src/controllers/llmo/llmo-onboarding.js:1011-1014

The LLMO flow sends enqueueLlmoOnboardingPublish() to the audits queue before audit triggers. Unlike regular audit types, this publishes LLMO UI data. If this handler migrates to the blackboard, silent failure means customers see a broken LLMO dashboard after onboarding - not just DLQ noise.

Fix: List trigger:llmo-onboarding-publish explicitly as a type to track during migration.

4. "Direct DB write" option insufficiently flagged as dangerous
docs/onboarding-blackboard-compatibility.md:193

Listed as a peer option with only "(tight coupling)" as downside. This bypasses all Control API validation, invariant enforcement, and creates a dual-writer concurrency risk.

Fix: Mark as "NOT RECOMMENDED" with explicit rationale, or remove.

5. Missing credential lifecycle discussion for s2s auth
docs/onboarding-blackboard-compatibility.md:189-194

API key or JWT is proposed but nothing about storage (Secrets Manager vs env vars), rotation cadence, scope (scan-only vs full admin), or ownership.

Fix: Add a subsection under prerequisites or open questions covering credential storage, scoping, and rotation.

6. BLACKBOARD_MAPPING in pseudocode is undefined
docs/onboarding-blackboard-compatibility.md:311-337

The pseudocode references BLACKBOARD_MAPPING[auditType] but the mapping is an open question. A developer copy-pasting this would silently route all types to legacy (empty mapping = all fallthrough).

Fix: Add a prominent note that the mapping must be resolved before this code is used. Consider populating with the partial known mappings as a starting point.

Minor (Nice to Have)

  • Duplicate "What Works" table at lines 64-73 and 174-182 - remove the first instance
  • Missing cross-reference to existing docs/onboard-workflow.md which documents ASO in detail
  • llmo-customer-analysis is enabled but not immediately triggered (deferred until DRS completion) - this timing nuance matters for migration sequencing
  • Brand-profile agent is triggered by controller/Slack wrapper, not performLlmoOnboarding itself - relevant for new entry points

Recommendations

  1. Add a "Consumers of LatestAudit" section - this is the biggest gap for a complete compatibility picture
  2. Add a rollback section - what happens if a blackboard audit regression is discovered for a migrated type? Can the legacy handler be re-enabled?
  3. Consider EventBridge as a fifth s2s option - an onboarding.scan-requested event avoids HTTP auth entirely and aligns with existing infrastructure
  4. Add a "when to migrate" decision checklist per audit type (producer registered, goal mapping published, LatestAudit backfill confirmed, onboarding updated)
  5. Consider mTLS/SigV4 for s2s auth instead of static API keys - both services run in the same AWS account/VPC
  6. Plan to clean up migrated types from the dispatcher proactively - a permanently noisy DLQ masks real failures

Assessment

Ready to merge? Yes, with follow-up

This is a research document, not an implementation spec. The analysis is substantively correct and fills a real knowledge gap. The identified gaps (LatestAudit storage, Step Functions workflow, credential lifecycle) should be addressed before this doc is used as a migration guide, but they don't block merging the research artifact. Recommend a follow-up pass to address the Important issues before migration planning begins.

- Add Mysticat/Mystique terminology clarification
- Remove duplicate "What Works" table
- Add ASO Step Functions workflow (4 processors) with compatibility notes
- Add Audit Result Storage section: LatestAudit replaced by CQRS pattern
  (facts -> projector -> projection_opportunity/suggestion tables)
- Document trigger:llmo-onboarding-publish in LLMO flow
- Mark "Direct DB write" as NOT RECOMMENDED
- Add credential lifecycle section (storage, scope, rotation, ownership)
- Add caveat to BLACKBOARD_MAPPING pseudocode
- Add cross-reference to onboard-workflow.md
- Note llmo-customer-analysis deferred timing and brand-profile trigger location
@github-actions
Copy link

This PR will trigger no release when merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants