Skip to content

STR 1935 Do startup checks in the new strata binary#1515

Open
krsnapaudel wants to merge 4 commits intomainfrom
STR-1935
Open

STR 1935 Do startup checks in the new strata binary#1515
krsnapaudel wants to merge 4 commits intomainfrom
STR-1935

Conversation

@krsnapaudel
Copy link
Contributor

@krsnapaudel krsnapaudel commented Mar 18, 2026

Description

  • Fail startup if bitcoind is unreachable (get_blockchain_info()).
  • Fail startup on bitcoind network mismatch vs config.bitcoind.network.
  • For persisted state, verify OLParams.last_l1_block hash from bitcoind.
  • For persisted state, verify OL DB invariants:
    • genesis block/state/epoch summary exist
    • tip block/state exist
    • non-genesis tip parent exists (genesis parent must be null)
    • previous epoch summary exists for post-genesis tip

This PR was created with help from Claude Code and Codex.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature/Enhancement (non-breaking change which adds functionality or enhances an existing one)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactor
  • New or updated tests
  • Dependency Update

Notes to Reviewers

Not sure if we need all checks. Please suggest how we can test these better.

Is this PR addressing any specification, design doc or external reference document?

  • Yes
  • No

If yes, please add relevant links:

Checklist

  • I have performed a self-review of my code.
  • I have commented my code where necessary.
  • I have updated the documentation if needed.
  • My changes do not introduce new warnings.
  • I have added (where necessary) tests that prove my changes are effective or that my feature works.
  • New and existing tests pass with my changes.
  • I have disclosed my use of AI in the body of this PR.

Related Issues

STR-1935

@krsnapaudel krsnapaudel self-assigned this Mar 18, 2026
@krsnapaudel krsnapaudel requested review from a team as code owners March 18, 2026 23:13
@krsnapaudel krsnapaudel marked this pull request as draft March 18, 2026 23:13
@krsnapaudel krsnapaudel requested a review from bewakes March 19, 2026 14:04
delbonis
delbonis previously approved these changes Mar 19, 2026
@krsnapaudel
Copy link
Contributor Author

@delbonis had to rebase.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 19, 2026

Commit: 3461523

SP1 Execution Results

program cycles success
EVM EE STF 1,321,255
Checkpoint 5,226
Checkpoint New 883,623

@codecov
Copy link

codecov bot commented Mar 19, 2026

Codecov Report

❌ Patch coverage is 90.04329% with 46 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.61%. Comparing base (3356f55) to head (d8a75d1).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
bin/strata/src/startup_checks.rs 90.23% 45 Missing ⚠️
bin/strata/src/main.rs 0.00% 1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (3356f55) and HEAD (d8a75d1). Click for more details.

HEAD has 3 uploads less than BASE
Flag BASE (3356f55) HEAD (d8a75d1)
unit 2 1
functional 2 0
@@             Coverage Diff             @@
##             main    #1515       +/-   ##
===========================================
- Coverage   84.34%   65.61%   -18.73%     
===========================================
  Files         802      801        -1     
  Lines       75697    76559      +862     
===========================================
- Hits        63846    50234    -13612     
- Misses      11851    26325    +14474     
Flag Coverage Δ
functional ?
unit 65.61% <90.04%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
bin/strata/src/main.rs 0.00% <0.00%> (-95.66%) ⬇️
bin/strata/src/startup_checks.rs 90.23% <90.23%> (ø)

... and 401 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

delbonis
delbonis previously approved these changes Mar 19, 2026
Copy link
Contributor

@bewakes bewakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the _on_restart suffix on almost every function feels unnecessary and indicates that the function is making assumptions about what context it is being called from (during restart) which is not necessary.

fn resolve_tip_ol_block(storage: &NodeStorage) -> Result<OLBlockCommitment> {
storage
.ol_block()
.get_canonical_tip_blocking()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is validating the wrong thing. get_canonical_tip_blocking() is still just "the first block at the highest slot", so on a forked tip this can pick an Unchecked/Invalid block and fail startup even when a valid tip exists. The storage helper currently does exactly that here:

/// Gets the canonical tip block commitment.
pub fn get_canonical_tip_blocking(&self) -> DbResult<Option<OLBlockCommitment>> {
let tip = self.get_tip_slot_blocking()?;
self.get_canonical_block_at_blocking(tip)
}
/// Gets the canonical tip block commitment.
pub async fn get_canonical_tip_async(&self) -> DbResult<Option<OLBlockCommitment>> {
let tip = self.get_tip_slot_async().await?;
self.get_canonical_block_at_async(tip).await
}
/// Gets the canonical block commitment at given height.
pub fn get_canonical_block_at_blocking(
&self,
tip: Slot,
) -> DbResult<Option<OLBlockCommitment>> {
let blocks = self.get_blocks_at_height_blocking(tip)?;
// TODO(STR-2105): determine how to get the canonical block. for now it is just the first
// one
Ok(blocks
.first()
.cloned()
.map(|id| OLBlockCommitment::new(tip, id)))

Can we either resolve a BlockStatus::Valid tip here, or avoid treating this helper as canonical until canonical selection is implemented?

ctx.config().bitcoind.network,
))?;

let has_persisted_client_state = ctx
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This persisted-state gate looks too narrow. init_ol_genesis() writes OL data before put_update_blocking() writes client state, so a crash between those two calls leaves OL state on disk but makes has_persisted_client_state false on restart. In that case these checks get skipped and the node goes back through genesis init, which seems likely to fail later with duplicate inserts instead of producing the intended startup-consistency error.

The write order is here:

match recent_state {
None => {
// Initialize OL genesis block and state
init_ol_genesis(ol_params, storage)
.map_err(|e| InitError::StorageCreation(e.to_string()))?;
// Create and insert init client state into db.
let init_state = ClientState::default();
let l1blk = ol_params.last_l1_block;
let update = ClientUpdateOutput::new_state(init_state.clone());
csman.put_update_blocking(&l1blk, update.clone())?;
Ok((l1blk, init_state))

Should this be keyed off the presence of persisted OL data as well, or should those initialization writes be made atomic?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants