feat: Schema Versioning and Migration for Rolling Upgrades#9
Merged
wongfei2009 merged 5 commits intomasterfrom Jan 20, 2026
Merged
feat: Schema Versioning and Migration for Rolling Upgrades#9wongfei2009 merged 5 commits intomasterfrom
wongfei2009 merged 5 commits intomasterfrom
Conversation
Implements schema versioning and migration handling system according to design document to address scenarios where database instances have different schema versions during rolling upgrades. Phase 1: Foundation (Schema Tracking with Atlas) - Add ariga.io/atlas dependency for SQLite introspection - Create SchemaManager using Atlas for deterministic schema hashing - Create SchemaCache for O(1) schema hash lookups - Add __harmonylite__schema_version table to track schema state - Initialize schema cache on startup and store hash in database - Add UpdateSchemaState() for recomputing schema hash Phase 2: Event Enhancement - Add SchemaHash field to ChangeLogEvent with CBOR omitempty tag - Populate SchemaHash during event creation (backward compatible) Phase 3: Validation and Pause-on-Mismatch - Add schema mismatch tracking fields to Replicator - Implement handleSchemaMismatch() with 5-minute periodic recompute - Create ListenWithDB() for schema-aware replication - Add O(1) hash comparison in replication hot path - NAK with 30s delay when schema mismatches (pauses replication) - Implement checkStreamGap() to detect message truncation - Exit process when stream gap detected (triggers snapshot restore) - Auto-resume when schema matches after recompute (no restart needed) - Add harmonylite_schema_mismatch_paused gauge metric Key Features: - Deterministic SHA-256 schema hashing using Atlas introspection - Self-healing: auto-detects schema changes and resumes replication - Stream gap detection prevents nodes from getting stuck - Backward compatible with events lacking SchemaHash - Observable via Prometheus metrics Phase 4 (cluster visibility via NATS KV) deferred for future work. Ref: docs/docs/design/schema-versioning.md
- Add SchemaRegistry with NATS KeyValue integration for cluster-wide schema state - Implement PublishSchemaState() to broadcast node schema hash to registry - Implement GetClusterSchemaState() to retrieve state from all nodes - Implement CheckClusterSchemaConsistency() to validate schema across cluster - Add CLI flags: -schema-status and -schema-status-cluster - Add printLocalSchemaStatus() to display local schema information - Add printClusterSchemaStatus() to display cluster-wide schema status with hash groups - Integrate schema publishing into harmonylite.go startup after CDC installation - Add comprehensive unit tests for schema registry functionality - All tests pass: unit tests (db/logstream) and E2E tests (10/10 specs) Phase 4 provides operators visibility into schema state across the cluster, making it easy to diagnose schema mismatches during rolling upgrades.
- Add run-schema-migration-test.sh for automated testing of schema versioning - Test verifies: hash computation, registry publishing, mismatch detection, rolling upgrade workflow, and schema convergence - Update README.md with comprehensive schema versioning documentation - Include troubleshooting guide for schema-related issues
…ades - Add Schema Mismatch Pause and Resume test context in e2e_test.go - Add schema migration helpers: alterTableAddColumn, hasColumn, insertBookWithRating, waitForCDCReady - Test validates rolling upgrade workflow: pause on mismatch, resume after upgrade - Covers key discovery: change_log table must be dropped and recreated after schema changes
- Add schema versioning to README.md features list - Update introduction.md with schema versioning as key feature - Update architecture.md with Schema Versioning section and flow diagram - Update production-deployment.md with rolling upgrade workflow - Add schema mismatch to replication.md failure modes and troubleshooting - Mark design/schema-versioning.md status as Implemented
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR implements Schema Versioning and Migration support for HarmonyLite, enabling safe rolling upgrades in production clusters where nodes may temporarily have different database schemas.
Problem Solved
During rolling upgrades, nodes temporarily have schema mismatches. Without this feature, replication could:
Solution
Phase 1-3: Schema Hashing and Validation
Phase 4: Cluster-wide Visibility
Pause/Resume Behavior
When a schema mismatch is detected:
Testing
Rolling Upgrade Workflow