Skip to content

branch-4.1: [fix](job) fix StreamingJob loadStatistic reset to zero after FE checkpoint restart #61997#62049

Open
github-actions[bot] wants to merge 1 commit intobranch-4.1from
auto-pick-61997-branch-4.1
Open

branch-4.1: [fix](job) fix StreamingJob loadStatistic reset to zero after FE checkpoint restart #61997#62049
github-actions[bot] wants to merge 1 commit intobranch-4.1from
auto-pick-61997-branch-4.1

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot commented Apr 2, 2026

Cherry-picked from #61997

…kpoint restart (#61997)

### What problem does this PR solve?

Problem Summary:
1. When FE restarts from a checkpoint image, `replayOnCommitted()` is
not called
for transactions committed before the checkpoint. Since `jobStatistic`
in
`StreamingInsertJob` lacked `@SerializedName`, it was not written into
the
image, causing `scannedRows` and `loadBytes` to reset to zero after
every
  checkpoint-based restart. 
2. add restart case for cdc tvf

#### Root Cause
`jobStatistic` accumulates statistics via `+=` in `replayOnCommitted()`.
Unlike `offset` (which is an assignment and idempotent), accumulated
values
cannot be recovered from post-checkpoint txn replay alone. The field
must be
persisted in the image to survive checkpoint restarts.
@github-actions github-actions bot requested a review from yiguolei as a code owner April 2, 2026 09:30
@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Apr 2, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring closed this Apr 2, 2026
@dataroaring dataroaring reopened this Apr 2, 2026
@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Apr 2, 2026

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants