Skip to content

x/bor: add target producer id in producer planned downtime#567

Open
kamuikatsurgi wants to merge 5 commits intodevelopfrom
kamui/target-producer-id
Open

x/bor: add target producer id in producer planned downtime#567
kamuikatsurgi wants to merge 5 commits intodevelopfrom
kamui/target-producer-id

Conversation

@kamuikatsurgi
Copy link
Copy Markdown
Member

Description

  • Allows a producer declaring planned downtime to optionally nominate a specific replacement via target_producer_id on MsgSetProducerDowntime
  • If the target is invalid, inactive, or down for the range, selection falls through to round-robin - the feature is best-effort, not a guarantee

Copilot AI review requested due to automatic review settings March 24, 2026 11:12
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review.

Tip: disable this comment in your organization's Code Review settings.

@kamuikatsurgi
Copy link
Copy Markdown
Member Author

@claude review for the PR diff, and check for the broader consensus issues across the codebase.

@claude
Copy link
Copy Markdown

claude bot commented Mar 24, 2026

Claude finished @kamuikatsurgi's task in 1m 48s —— View job


Review: x/bor target producer id in producer planned downtime

  • Gather context: read PR diff and changed files
  • Review proto and message type changes
  • Review keeper and side message server logic
  • Review vote extension (veblop) changes
  • Review tests for coverage and correctness
  • Check broader consensus implications across the codebase
  • Post final review

Overall Assessment

Clean, well-structured change. The feature is properly height-gated via targetProducerOverrideHeight, the fallthrough-to-round-robin design is the right call for safety, and test coverage is thorough. A few items below ranging from consensus-safety observations to minor nits.


Consensus-Critical Observations

1. SelectNextSpanProducer does not reject self-targetingveblop.go:270

The side handler (side_msg_server.go:289) and msg_server (msg_server.go:330) both reject targetProducerId == currentProducer upstream, but SelectNextSpanProducer itself does not. The test at veblop_test.go:627 explicitly documents this: "the side handler and msg_server reject self-targeting upstream."

This is fine as defense-in-depth at the calling layers, but worth noting that if SelectNextSpanProducer is ever called from a new code path that skips those checks, a producer could nominate itself. Consider adding a guard in SelectNextSpanProducer as well — it's a one-line check and would make the function self-contained.

Fix this →

2. Side handler votes VOTE_NO on CalculateProducerSet failure for target validationside_msg_server.go:297-299

If CalculateProducerSet fails in the target validation block, the side handler returns VOTE_NO. This is a strict response — if the failure is transient (e.g., state iteration hiccup), it could cause honest validators to disagree on votes for an otherwise-valid downtime message, since the existing non-target validation path doesn't call CalculateProducerSet.

This is mitigated by the fact that this only fires when targetProducerId != 0, so old-style messages (target=0) are unaffected. But for target-bearing messages, consider whether a softer fallback (ignoring the target and voting YES anyway) would be safer for consensus agreement. Right now the design is: "if I can't verify the target, reject the whole message." This is reasonable but more aggressive than the execution path (which falls through to round-robin on any target failure).

3. Determinism is maintained — confirmed

The targetProducerID flows through deterministic paths only:

  • PostHandleSetProducerDowntimeAddNewVeBlopSpanSelectNextSpanProducer (all in PreBlocker, deterministic)
  • No map iteration issues — the target check uses a linear scan of activeCandidates (a slice)
  • The height gate uses helper.GetTargetProducerOverrideHeight() which is chain-specific and deterministic

All good here.


Height Gate Values

4. All targetProducerOverrideHeight values are 0 (TBD)helper/config.go:489,497,509

All chains (mainnet, mumbai, amoy) have targetProducerOverrideHeight = 0 // TBD. Since the height gate check is ctx.BlockHeight() >= 0, this means the feature is immediately active on all chains once deployed.

If the intent is to coordinate activation at a specific block height (as is typical for consensus-affecting changes), the TBD values should be set before merging. If immediate activation is intentional, the // TBD comments are misleading and should be removed.


Code Quality

5. Duplicated "is target in candidates" logic across three files

The same linear search pattern appears in:

  • msg_server.go:333-339
  • side_msg_server.go:304-310
  • veblop.go:272-278

Consider extracting a small helper like containsID(candidates []uint64, id uint64) bool to reduce duplication. Low priority — the logic is simple and correct in all three places.

6. Nit: blank line removal in config.go:97

The diff removes a blank line between the comment // Self-healing defaults and DefaultEnableSH. This is cosmetic but unrelated to the feature — could be a separate cleanup.


Test Coverage

Tests are comprehensive and well-structured:

  • msg_server_test.go: covers target=0 (round-robin), valid target, self-targeting rejection, and target-not-in-set rejection
  • veblop_test.go (TestSelectNextSpanProducerWithTarget): 8 sub-tests covering happy path, downtime overlap, inactive target, zero target, non-existent target, self-target passthrough, partial overlap, and fallthrough determinism
  • veblop_test.go (TestAddNewVeBlopSpanWithTarget, TestAddNewVeBlopSpanTargetFallthrough): integration tests for the full span creation flow

Missing test: SideHandleSetProducerDowntime with a non-zero targetProducerId. The existing side handler tests in side_msg_server_test.go all use newMsg(addr, start, end) which produces TargetProducerId: 0. Adding a test case with a valid target (and one with an invalid target) in the SideHandleSetProducerDowntime test suite would verify the new VOTE_NO paths at side_msg_server.go:281-315.

Fix this →


Summary

The change is well-designed with proper height-gating and graceful fallthrough. The main items to address before merge:

Priority Item
High Set actual targetProducerOverrideHeight values (or confirm 0 is intentional and remove TBD comments)
Medium Add side handler test coverage for non-zero targetProducerId
Low Consider self-target guard in SelectNextSpanProducer for defense-in-depth
Low Extract containsID helper to reduce duplication

|

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the Bor planned-downtime flow to allow a producer to optionally request a specific replacement producer via a new target_producer_id field on MsgSetProducerDowntime, and threads that preference through span creation/producer selection (with fallthrough to the existing round-robin behavior when not eligible).

Changes:

  • Add target_producer_id to MsgSetProducerDowntime protobuf/API types and propagate it through constructors and CLI wiring.
  • Update SelectNextSpanProducer / AddNewVeBlopSpan to optionally honor a requested target producer (height-gated).
  • Add/extend keeper and type tests to cover target selection and fallthrough paths.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
proto/heimdallv2/bor/tx.proto Adds target_producer_id to the downtime message schema.
x/bor/types/tx.pb.go Regenerated protobuf Go types/codec for the new field.
api/heimdallv2/bor/tx.pulsar.go Regenerated Pulsar fast-reflection code to include the new field.
x/bor/types/msg.go Extends NewMsgSetProducerDowntime to accept and set targetProducerID.
x/bor/types/msg_test.go Adds coverage for the new constructor parameter.
x/bor/client/cli/flags.go Introduces the --target-producer-id CLI flag name constant.
x/bor/client/cli/tx.go Reads the new CLI flag and populates MsgSetProducerDowntime.
x/bor/keeper/veblop.go Threads targetProducerID into producer selection and tries to honor it when eligible.
x/bor/keeper/veblop_test.go Adds selection tests for honoring/falling-through from a target producer.
x/bor/keeper/side_msg_server.go Side-handler validation for target producer (height-gated) + passes target into span creation.
x/bor/keeper/side_msg_server_test.go Adds tests around downtime validation boundaries and target-producer handling.
x/bor/keeper/msg_server.go Adds msg-server validation for TargetProducerId (height-gated).
x/bor/keeper/msg_server_test.go Adds msg-server tests for target producer accept/reject scenarios.
helper/config.go Adds targetProducerOverrideHeight config knob and getter.
app/abci.go Updates AddNewVeBlopSpan callsites for the new function signature (passes 0).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 24, 2026

Codecov Report

❌ Patch coverage is 57.89474% with 48 lines in your changes missing coverage. Please review.
✅ Project coverage is 41.8%. Comparing base (2166896) to head (b9b8614).

Files with missing lines Patch % Lines
app/abci.go 15.3% 9 Missing and 2 partials ⚠️
x/bor/keeper/keeper.go 10.0% 8 Missing and 1 partial ⚠️
x/bor/keeper/side_msg_server.go 70.9% 5 Missing and 4 partials ⚠️
app/app.go 0.0% 8 Missing ⚠️
helper/config.go 50.0% 4 Missing ⚠️
x/bor/keeper/veblop.go 85.7% 2 Missing and 2 partials ⚠️
x/bor/client/cli/tx.go 0.0% 3 Missing ⚠️

❌ Your patch check has failed because the patch coverage (57.8%) is below the target coverage (90.0%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           develop    #567     +/-   ##
=========================================
+ Coverage     41.6%   41.8%   +0.2%     
=========================================
  Files          172     172             
  Lines        17499   17601    +102     
=========================================
+ Hits          7296    7374     +78     
- Misses        9124    9148     +24     
  Partials      1079    1079             
Files with missing lines Coverage Δ
x/bor/keeper/msg_server.go 81.8% <100.0%> (+1.1%) ⬆️
x/bor/types/msg.go 100.0% <100.0%> (ø)
x/bor/types/params.go 90.0% <ø> (ø)
x/bor/types/util.go 93.4% <ø> (ø)
x/bor/client/cli/tx.go 0.0% <0.0%> (ø)
helper/config.go 20.0% <50.0%> (+0.4%) ⬆️
x/bor/keeper/veblop.go 64.0% <85.7%> (+4.8%) ⬆️
app/app.go 56.0% <0.0%> (-0.7%) ⬇️
x/bor/keeper/keeper.go 65.2% <10.0%> (-1.6%) ⬇️
x/bor/keeper/side_msg_server.go 60.3% <70.9%> (+3.4%) ⬆️
... and 1 more

... and 2 files with indirect coverage changes

Files with missing lines Coverage Δ
x/bor/keeper/msg_server.go 81.8% <100.0%> (+1.1%) ⬆️
x/bor/types/msg.go 100.0% <100.0%> (ø)
x/bor/types/params.go 90.0% <ø> (ø)
x/bor/types/util.go 93.4% <ø> (ø)
x/bor/client/cli/tx.go 0.0% <0.0%> (ø)
helper/config.go 20.0% <50.0%> (+0.4%) ⬆️
x/bor/keeper/veblop.go 64.0% <85.7%> (+4.8%) ⬆️
app/app.go 56.0% <0.0%> (-0.7%) ⬇️
x/bor/keeper/keeper.go 65.2% <10.0%> (-1.6%) ⬇️
x/bor/keeper/side_msg_server.go 60.3% <70.9%> (+3.4%) ⬆️
... and 1 more

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@claude
Copy link
Copy Markdown

claude bot commented Mar 24, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

marcello33
marcello33 previously approved these changes Mar 25, 2026
@kamuikatsurgi
Copy link
Copy Markdown
Member Author

@claude review

@claude
Copy link
Copy Markdown

claude bot commented Mar 25, 2026

Claude finished @kamuikatsurgi's task in 3m 36s —— View job


Review: x/bor target producer id in producer planned downtime

  • Read PR diff and changed files
  • Review core logic (keeper, side handler, veblop)
  • Review consensus safety and determinism
  • Check for remaining issues from prior reviews
  • Post final review

Overall Assessment

Well-structured change since the latest commit (17abc78). The height-gating, defense-in-depth self-target guard in SelectNextSpanProducer, RoundRobinDefault constant, side handler tests, and pre-fork rejection have all been properly addressed. A few items remain.


Issues

1. Redundant condition checks — msg_server.go:328-348, side_msg_server.go:281-327

Both files have two sequential if blocks that can be collapsed:

// Block 1: rejects if target != 0 AND height < fork
if msg.TargetProducerId != types.RoundRobinDefault && height < forkHeight {
    return error
}
// Block 2: validates if target != 0 AND height >= fork  
if msg.TargetProducerId != types.RoundRobinDefault && height >= forkHeight {
    // validation...
}

After block 1 returns on < forkHeight, block 2's >= forkHeight check is always true. This should be a single if with an early return:

if msg.TargetProducerId != types.RoundRobinDefault {
    if height < forkHeight {
        return error // or VOTE_NO
    }
    // validation...
}

Not a bug, but cleaner and avoids evaluating GetTargetProducerOverrideHeight() twice.

Fix this →

2. Validation strictness asymmetry between msg_server and side_msg_server

msg_server.go:334 validates the target against the full candidates list (from CalculateProducerSet), while side_msg_server.go:308 does the same. However, SelectNextSpanProducer at veblop.go:278 checks against activeCandidates — a subset filtered by active validators. This means a target could pass validation in both handlers but still fall through to round-robin at execution time because the target isn't in activeCandidates.

This is consistent with the "best effort" design, but it's worth noting that a user could submit a message that appears valid but whose target is silently ignored. Consider adding a comment near the candidates check in msg_server.go explaining this intentional gap:

// NOTE: we validate against the full candidate set here; at execution time
// SelectNextSpanProducer further filters by active validators and falls
// through to round-robin if the target is no longer in the active set.

3. Height gate values still TBDhelper/config.go:489,497,509

All targetProducerOverrideHeight values are 0 // TBD. With 0, the feature activates immediately on deployment (since blockHeight >= 0 is always true). This was flagged in the prior review and acknowledged — just confirming it's still outstanding and should be set before merge to avoid mixed-version network issues during rollout.


Looks Good

  • Determinism: Target selection flows through deterministic paths only — slice iteration (no maps), height-gated checks, and state reads in PreBlocker context.
  • Defense-in-depth: Self-target guard in SelectNextSpanProducer (veblop.go:272) prevents future callers from bypassing upstream checks.
  • Graceful fallthrough: Every failure mode in SelectNextSpanProducer falls through to round-robin rather than blocking span creation.
  • RoundRobinDefault constant: Clean self-documenting replacement for magic 0 at all callsites including abci.go.
  • Side handler test coverage: TestSideHandleSetProducerDowntimeTargetProducer covers zero target, valid target, self-targeting, not-in-set, pre-fork rejection, and pre-fork round-robin acceptance. TestSideHandleSetProducerDowntimeGetValIdError covers the error path.
  • Pre-fork rejection: Both msg_server and side_msg_server properly reject TargetProducerId != 0 before the fork height.
  • expected_keepers.go: GetValIdFromAddress is properly declared in the bor module's expected keepers interface.

Summary

Priority Item Status
High Set actual targetProducerOverrideHeight values before merge Pending (acknowledged)
Low Collapse redundant condition checks in msg_server and side_msg_server Suggested
Nit Add comment about validation gap between handlers and execution Suggested

View job

@cffls
Copy link
Copy Markdown
Contributor

cffls commented Mar 25, 2026

Looks good overall. Could you test a case where a cluster is half upgraded before the HF, a heimdall node with new version send out a rotation message request with target candidate, and make sure the consensus won't halt? Last time when the propose down time message was introduced, we had to add a check here.

Copilot AI review requested due to automatic review settings March 27, 2026 07:39
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sonarqubecloud
Copy link
Copy Markdown

@kamuikatsurgi
Copy link
Copy Markdown
Member Author

Looks good overall. Could you test a case where a cluster is half upgraded before the HF, a heimdall node with new version send out a rotation message request with target candidate, and make sure the consensus won't halt? Last time when the propose down time message was introduced, we had to add a check here.

Good catch, I added checks there as well similar to what we had for MsgSetProducerDowntime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants