Skip to content

Backport: fix auto deletion of wal files#1454

Open
purusang wants to merge 6 commits intomainfrom
backport/wal-pruning-fix
Open

Backport: fix auto deletion of wal files#1454
purusang wants to merge 6 commits intomainfrom
backport/wal-pruning-fix

Conversation

@purusang
Copy link
Contributor

@purusang purusang commented Mar 3, 2026

Description

This is a back ported fix which has already been tested in staging.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature/Enhancement (non-breaking change which adds functionality or enhances an existing one)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactor
  • New or updated tests
  • Dependency Update

Notes to Reviewers

Is this PR addressing any specification, design doc or external reference document?

  • Yes
  • No

If yes, please add relevant links:

Checklist

  • I have performed a self-review of my code.
  • I have commented my code where necessary.
  • I have updated the documentation if needed.
  • My changes do not introduce new warnings.
  • I have added (where necessary) tests that prove my changes are effective or that my feature works.
  • New and existing tests pass with my changes.
  • I have disclosed my use of AI in the body of this PR.

Related Issues

@purusang purusang self-assigned this Mar 3, 2026
@codecov
Copy link

codecov bot commented Mar 3, 2026

Codecov Report

❌ Patch coverage is 0% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.34%. Comparing base (e160667) to head (977333a).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
crates/reth/exex/src/prover_exex.rs 0.00% 3 Missing ⚠️
crates/eectl/src/worker.rs 0.00% 1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (e160667) and HEAD (977333a). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (e160667) HEAD (977333a)
functional 1 0
@@             Coverage Diff             @@
##             main    #1454       +/-   ##
===========================================
- Coverage   75.58%   65.34%   -10.24%     
===========================================
  Files         802      800        -2     
  Lines       75676    76102      +426     
===========================================
- Hits        57200    49730     -7470     
- Misses      18476    26372     +7896     
Flag Coverage Δ
functional ?
unit 65.34% <0.00%> (-0.31%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
crates/eectl/src/worker.rs 13.72% <0.00%> (ø)
crates/reth/exex/src/prover_exex.rs 0.00% <0.00%> (ø)

... and 256 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

Commit: ada79b1

SP1 Execution Results

program cycles success
EVM EE STF 1,321,280
Checkpoint 5,226
Checkpoint New 883,623

@storopoli storopoli self-assigned this Mar 3, 2026
@storopoli
Copy link
Member

note for myself: we need to implement wait_for_genesis here which has been done in releases/0.2.0

@storopoli storopoli force-pushed the backport/wal-pruning-fix branch 3 times, most recently from fcc620b to 489fd98 Compare March 6, 2026 16:18
@storopoli storopoli marked this pull request as ready for review March 6, 2026 16:32
@storopoli storopoli requested review from a team as code owners March 6, 2026 16:32
storopoli
storopoli previously approved these changes Mar 6, 2026
Copy link
Member

@storopoli storopoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 489fd98

delbonis
delbonis previously approved these changes Mar 6, 2026
@storopoli storopoli added this pull request to the merge queue Mar 9, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 9, 2026
@alexhui01 alexhui01 added this pull request to the merge queue Mar 9, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 9, 2026
@storopoli storopoli added this pull request to the merge queue Mar 9, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 9, 2026
@storopoli storopoli dismissed stale reviews from delbonis and themself via f97ebd5 March 9, 2026 12:47
@storopoli storopoli force-pushed the backport/wal-pruning-fix branch from f97ebd5 to 489fd98 Compare March 9, 2026 13:48
@barakshani
Copy link
Contributor

new test_exex_wal_pruning constantly fails. Does it pass locally?

@storopoli
Copy link
Member

new test_exex_wal_pruning constantly fails. Does it pass locally?

Yes it does. That's why I'm stalled in this PR :/

@barakshani
Copy link
Contributor

something is a bit weird in the tests.... (the fact that checkpoints get final). Closing and reopening to make sure CI uses the correct code.

@barakshani barakshani closed this Mar 9, 2026
@barakshani barakshani reopened this Mar 9, 2026
Copy link
Contributor

@barakshani barakshani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this PR mixes 2 notions of finalisation (and actually admit it in one of the comments):

  • reth has some notion of block finlisation, which I believe it's 100 blocks of confirmation.
  • OL has a different notion of (bitcoin) finlisation.

I believe that ExEX kicks in when the block is final (ExecCommand::NewFinalizedTip) according to reth. This PR checks checkpoint (OL) finalisation as some triggering point in time. I may wrong, but this need further investigation. Anyway the test constantly fails right now.

Also left some comment about the correctness of our method to see if files were pruned.


# Mine L1 blocks while polling Strata status until epoch 1 is confirmed.
#
# In functional-tests-new `el_ol`, epoch finalization may not always be available
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, ideally, would be fixed in https://alpenlabs.atlassian.net/browse/STR-2424.

#
# In functional-tests-new `el_ol`, epoch finalization may not always be available
# (for example when no proving/finalization pipeline is wired in the environment),
# but confirmation still advances with L1 progress and is sufficient to exercise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if so, we should not check for finalization, because the test may fail unjustifiably.

@barakshani
Copy link
Contributor

PR #1479 adds seal config to el_ol, which is used in the test in the current PR. How come in here we are able to have checkpoints final, in particular sealed?

Comment on lines +100 to +107
status_after_finalization = wait_until_with_value(
lambda: _mine_and_get_sync_status(strata_seq, btc_rpc, mine_address),
lambda status: status["finalized"] is not None
and status["finalized"]["epoch"] >= 1,
error_with="Epoch 1 was not finalized in time",
timeout=120,
step=2,
)
Copy link
Contributor Author

@purusang purusang Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@storopoli To prune wal files, the ideal requirement should be to have the proofs to be generated and stored in db (or might only be witness stored in db 🤔 ). In releases/0.2.0 I waited for OL checkpoint finalization because that was the safest signal to cleanup wal files. And in fast mode epoch could be finalized in seconds there. So we just waited for a finalized checkpoint. And we checked if reth cleaned up wal files.

In main branch, I found out that OL checkpoints never get finalized (which @bewakes confirmed as well) due to which this condition never passes. So I think this whole functional test can be simplified by just having two waits:

  • first we wait until we have some n number of wal files
  • after that we wait until some of those preexisting wal files gets deleted, don't need to wait for all of them to be cleaned up.

In this test we do not care if OL checkpoint finalizes or not. We just want to make sure that the wal files are cleared by reth after sometime.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See c92fffe

@storopoli
Copy link
Member

@sapinb do you think this fix is relevant for the Reth issues we experienced in testnet I?

purusang and others added 5 commits March 18, 2026 11:53
- Fix ProverWitnessGenerator emitting FinishedHeight with block number 0
  (from outcome.first_block()) instead of the real block number, which
  prevented WAL files from ever being pruned
- Add functional test verifying WAL files are pruned after epoch finalization
@storopoli storopoli force-pushed the backport/wal-pruning-fix branch from 489fd98 to c92fffe Compare March 18, 2026 17:33
Copy link
Contributor

@delbonis delbonis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wait calls about WAL files can stay as-is, those are weird and specific so don't make sense in a general-purpose wait helper.

The el_ol environment does not reliably expose checkpoint finalization, so the
functional test now only checks that preexisting WAL files eventually
disappear. This keeps the pruning assertion stable even while new WAL files are
created during block production.
@storopoli storopoli force-pushed the backport/wal-pruning-fix branch from c92fffe to 977333a Compare March 20, 2026 16:39
@storopoli storopoli requested a review from delbonis March 20, 2026 16:39
Copy link
Member

@storopoli storopoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 977333a

@storopoli
Copy link
Member

@purusang whenever you have time please take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants