branch-4.1: [fix](streaming-job) Fix PG replication slot leak when streaming task is cancelled during pause/resume #62010#62033
Open
github-actions[bot] wants to merge 1 commit intobranch-4.1from
Open
Conversation
… is cancelled during pause/resume (#62010) ### What problem does this PR solve? Problem Summary: When a PostgreSQL CDC streaming job is paused and resumed, the PG replication slot can be permanently leaked, causing all subsequent tasks to fail with: `replication slot "doris_cdc_xxx" is active for PID xxx` **Root cause:** The CDC client reuses a single `SourceReader` instance per jobId (`Env.getOrCreateReader`). When FE cancels a task (PAUSE), the BE HTTP connection is closed, but the CDC client's `buildStreamRecords` thread may still be blocked in `pollRecords` (up to 15s timeout). Before the old task finishes, the new task (after RESUME) arrives at the same CDC client and calls `prepareStreamSplit`, which overwrites `this.streamReader` with a new Fetcher without closing the old one. The old Debezium reader (holding the PG replication connection) is leaked — its reference is lost, so `finishSplitRecords` in the old task's finally block closes the new Fetcher instead, and the PG slot is never released. From the logs, the slot remained occupied for 25+ minutes until the test timed out: Failed to start replication stream at LSN{0/318EBC8}; when setting up multiple connectors for the same database host, please make sure to use a distinct replication slot name for each. **Fix:** Close the previous stream/binlog reader before creating a new one in `prepareStreamSplit` (PG) and `prepareBinlogSplit` (MySQL). This ensures the old Debezium connection is properly released when a new task reuses the same SourceReader.
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
|
run buildall |
Member
|
run external |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cherry-picked from #62010