feat(control-plane): UserIdDirectory + ServerIdDirectory, e2e stability fixes#44
feat(control-plane): UserIdDirectory + ServerIdDirectory, e2e stability fixes#44
Conversation
[P1] LocalMatchQueue: Use UUID for ticketId (align with ApiMatchQueue) - Prevents collision with old Redis assignments after restart - Fixes provisioning e2e failures [P1] RedisMatchmakingStore: Add assignment TTL (300s) - Switch from hash to per-key storage with EXPIRE - Prevents stale assignment data accumulation [P1] Defer env reads to runtime - BullMQModule: forRootAsync useFactory for Redis config - MatchmakingModule: useFactory for MatchmakingConfig [P2] Split-role e2e: Skip in-process test (cannot achieve role isolation) - jest.resetModules breaks NestJS BullMQ ModuleRef - matchmaking-split-roles-external.e2e-spec.ts provides definitive verification Co-authored-by: Cursor <cursoragent@cursor.com>
- Delete matchmaking-split-roles.e2e-spec.ts (could not achieve role isolation) - test:e2e:split now runs matchmaking-split-roles-external.e2e-spec.ts - Use npm run test:e2e:split or npm run test:e2e:split:sh for split-role verification Co-authored-by: Cursor <cursoragent@cursor.com>
…ollution - test:e2e:split: add REDIS_DB=1 (align with test:e2e) - e2e-split.sh: add REDIS_DB=1 for spawned API/worker processes - Fixes 'No server available' / timeout when Redis DB 0 has stale data Co-authored-by: Cursor <cursoragent@cursor.com>
[P1] admin.e2e Connection is closed: - Use beforeAll/afterAll instead of beforeEach/afterEach to reduce app lifecycle - Add 400ms drain delay in closeApp after app.close() for BullMQ teardown [P2] Assignment TTL vs token expiry mismatch: - Change ASSIGNMENT_TTL_SECONDS from 300 to 3600 (align with JWT exp 1h) - Avoids 'token valid but status not found' when client polls after 5min Co-authored-by: Cursor <cursoragent@cursor.com>
- Add e2e-global-setup.ts: flush Redis matchmaking/BullMQ keys before tests - Reduces 'No server available' from stale jobs between runs - Restore admin beforeAll/afterAll to avoid Connection is closed (trade-off: shared state) - Keep 800ms drain delay in closeApp Co-authored-by: Cursor <cursoragent@cursor.com>
…me userIdDirectory
450f71e to
17c6d34
Compare
💡 Codex ReviewThe gateway is always registered, so WebSocket messages are only JSON-parsed and then cast to ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
… class-validator (P2)
|
Both comments addressed in commit 585610e: P1 — Gate RealtimeGateway to API-enabled roles Added an early-return in if (!isApiEnabled(getMatchmakingRole())) {
client.close(4403, 'This node does not serve WebSocket connections (queue-worker role)');
return;
}
P2 — Validate WebSocket enqueue payload via class-validator Added const dto = plainToInstance(WsEnqueueMessageDto, m);
const errors = await validate(dto);
if (errors.length > 0) {
this.sendError(client, `Invalid enqueue payload: ${detail}`);
return;
}Malformed payloads (missing All tests pass: build ✅ / unit 49/49 ✅ / e2e 18/19 ✅ (1 pre-existing skip) |
Summary
Part 1 — Matchmaking stability fixes (pre-existing commits)
ticketIdnow UUID; assignment TTL set correctlyPart 2 — ClusterDirectory → UserIdDirectory + ServerIdDirectory refactor
Refactors the in-memory
ServerRegistryServiceinto a Redis-backedServerIdDirectory, making game server registrations visible across all control-plane nodes.Architecture change:
UserIdDirectory(USER_ID_DIRECTORY) — userId→nodeId with TTL lease; used by Realtime and Matchmaking (rename ofClusterDirectory)ServerIdDirectory(SERVER_ID_DIRECTORY) — serverId→ServerEntry; Redis-backed, cross-node; used by Provisioning and AdminClusterDirectorytoken andServerRegistryServiceremovedFiles changed:
infra/cluster-directory/— newUserIdDirectory,ServerIdDirectoryinterfaces + Redis implsinfra/contracts/server-entry.dto.ts—ServerEntryextracted to shared locationProvisioningController,InMemoryProvisioningClient— injectSERVER_ID_DIRECTORYAdminController— injectSERVER_ID_DIRECTORY,getServers()now asyncUserSessionRegistryService,MatchmakingService— injectUSER_ID_DIRECTORYE2E Redis isolation:
e2e-global-setup.tsande2e-helpers.flushServerKeys()flushcd:server:*keys between suites to prevent cross-test pollutionTest plan
npm run build— PASSnpm test— 11 suites / 49 tests PASSnpm run test:e2e -- --runInBand— 6 suites / 18 passed + 1 skipped PASS🤖 Generated with Claude Code