-
Notifications
You must be signed in to change notification settings - Fork 203
[BFT] Remove VoteCollectorStatusInvalid from state machine
#8219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[BFT] Remove VoteCollectorStatusInvalid from state machine
#8219
Conversation
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
| case hotstuff.VoteCollectorStatusVerifying: | ||
| verifyingProc, ok := proc.(hotstuff.VerifyingVoteProcessor) | ||
| if !ok { | ||
| return fmt.Errorf("while processing block %v, found that VoteProcessor reports status %s but has an incompatible implementation type %T", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the following lines still needed?
flow-go/consensus/hotstuff/votecollector/statemachine.go
Lines 333 to 337 in 25a198a
| verifyingProc, ok := proc.(hotstuff.VerifyingVoteProcessor) | |
| if !ok { | |
| return fmt.Errorf("while processing block %v, found that VoteProcessor reports status %s but has an incompatible implementation type %T", | |
| proposal.Block.BlockID, proc.Status(), verifyingProc) | |
| } |
Previously, we did the cast, because we wanted to call
terminateVoteProcessing. Since that's no longer happening, the cast here is essentially only a sanity check.
For simplicity, I would be inclined to remove the code, because this sanity check just covers a very narrow range of bugs, for which we typically don't include similar sanity checks. Furthermore, in the specific case here, there is nothing that would go catastrophically wrong immediately if the didn't detect the wrong type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the following no longer holds:
flow-go/consensus/hotstuff/votecollector/statemachine.go
Lines 50 to 55 in ef4b114
| // * In the current implementation, the `votesCache` does not catch leader attacks 1. and 2., which exploit the fact that stand-alone | |
| // votes and votes embedded in proposals are processed concurrently through different code paths. We emphasize that attack vectors | |
| // 1. and 2. are only available to a byzantine leader as long as the leader has not (yet) equivocated for the current view. Once | |
| // the VoteCollector notices that the _leader_ equivocates, it immediately stops accepting proposals for the view, thereby closing | |
| // attack vectors 1. and 2. Hence, it is sufficient for the [VerifyingVoteProcessor] to catch _all_ equivocation attacks, | |
| // including attacks 1. and 2. |
Specifically:
it immediately stops accepting proposals for the view
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to address this with my commit ... turned into bit of a broader breakdown. Please take a look when you have time 🎄
… including detection and reporting of attacks
WalkthroughRemoved the Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
consensus/hotstuff/vote_collector.go (1)
39-48: Stale status name in array and off-by-one error in bounds check.Two issues in the
String()method:
collectorStatusNamesstill contains"VoteCollectorStatusInvalid"on line 41, but this status was removed from the enum. This orphaned element should be removed.The bounds check on line 44 uses
>but should use>=. With the current array of 3 elements,len(collectorStatusNames)is 3, so a status value of 3 would pass the check but cause an out-of-bounds access.🔎 Proposed fix
var collectorStatusNames = [...]string{"VoteCollectorStatusCaching", - "VoteCollectorStatusVerifying", - "VoteCollectorStatusInvalid"} + "VoteCollectorStatusVerifying"} func (ps VoteCollectorStatus) String() string { - if ps < 0 || int(ps) > len(collectorStatusNames) { + if ps < 0 || int(ps) >= len(collectorStatusNames) { return "UNKNOWN" } return collectorStatusNames[ps] }
♻️ Duplicate comments (1)
consensus/hotstuff/votecollector/statemachine.go (1)
395-404: Consider removing the now-unnecessary type assertion.As noted in a prior review comment, the type assertion to
hotstuff.VerifyingVoteProcessor(lines 396-400) was previously needed to callterminateVoteProcessing. With that logic removed, this cast only serves as a sanity check that doesn't prevent any immediate issue if it fails (the code just returns anyway).For simplicity, consider removing this check or, if you want to keep it as a defensive measure, at least use the blank identifier since
verifyingProcis unused:🔎 Option 1: Remove the type assertion entirely
// We already received a valid block for this view; continue collecting votes for the valid proposal we received // (even if proposer is equivocating - for reasoning, see BFT NOTES above). case hotstuff.VoteCollectorStatusVerifying: - verifyingProc, ok := proc.(hotstuff.VerifyingVoteProcessor) - if !ok { - return fmt.Errorf("while processing block %v, found that VoteProcessor reports status %s but has an incompatible implementation type %T", - proposal.Block.BlockID, proc.Status(), verifyingProc) - } + // Already processing a valid proposal for this view; ignore subsequent proposals. default:🔎 Option 2: Keep sanity check but use blank identifier
- verifyingProc, ok := proc.(hotstuff.VerifyingVoteProcessor) + _, ok := proc.(hotstuff.VerifyingVoteProcessor) if !ok { return fmt.Errorf("while processing block %v, found that VoteProcessor reports status %s but has an incompatible implementation type %T", - proposal.Block.BlockID, proc.Status(), verifyingProc) + proposal.Block.BlockID, proc.Status(), proc) }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
consensus/hotstuff/vote_collector.goconsensus/hotstuff/votecollector/statemachine.goconsensus/hotstuff/votecollector/statemachine_test.go
💤 Files with no reviewable changes (1)
- consensus/hotstuff/votecollector/statemachine_test.go
🧰 Additional context used
📓 Path-based instructions (2)
**/*.go
📄 CodeRabbit inference engine (.cursor/rules/coding_conventions.mdc)
Follow Go coding conventions as documented in @docs/agents/CodingConventions.md
Follow Go coding standards and conventions as documented in @docs/agents/GoDocs.md
**/*.go: Follow the existing module structure in/module/,/engine/,/model/and use dependency injection patterns for component composition
Implement proper interfaces before concrete types
Follow Go naming conventions and the project's coding style defined in /docs/CodingConventions.md
Use mock generators: runmake generate-mocksafter interface changes
All inputs must be considered potentially byzantine; error classification is context-dependent and no code path is safe unless explicitly proven and documented
Use comprehensive error wrapping for debugging; avoidfmt.Errorf, useirrecoverablepackage for exceptions
NEVER log and continue on best effort basis; ALWAYS explicitly handle errors
Uses golangci-lint with custom configurations (.golangci.yml) and custom linters for Flow-specific conventions (struct write checking)
Files:
consensus/hotstuff/votecollector/statemachine.goconsensus/hotstuff/vote_collector.go
{network,engine,consensus}/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Network messages must be authenticated and validated
Files:
consensus/hotstuff/votecollector/statemachine.goconsensus/hotstuff/vote_collector.go
🧠 Learnings (1)
📚 Learning: 2025-12-23T00:28:40.994Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:40.994Z
Learning: Applies to {storage,ledger,execution,fvm}/**/*.go : State consistency is paramount; use proper synchronization primitives
Applied to files:
consensus/hotstuff/votecollector/statemachine.go
🧬 Code graph analysis (1)
consensus/hotstuff/votecollector/statemachine.go (1)
consensus/hotstuff/vote_collector.go (1)
VoteCollector(75-103)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (37)
- GitHub Check: Lint (./integration/)
- GitHub Check: Lint (./insecure/)
- GitHub Check: Lint (./)
- GitHub Check: Integration Tests Others (integration)
- GitHub Check: Unit Tests (network/p2p/connection)
- GitHub Check: Unit Tests (others)
- GitHub Check: Unit Tests (engine/verification)
- GitHub Check: Unit Tests (engine/execution/ingestion)
- GitHub Check: Unit Tests (engine/execution)
- GitHub Check: Unit Tests (engine/collection)
- GitHub Check: Unit Tests (module)
- GitHub Check: Unit Tests (network/p2p/node)
- GitHub Check: Unit Tests (network/p2p)
- GitHub Check: Unit Tests (network/alsp)
- GitHub Check: Unit Tests (network/p2p/scoring)
- GitHub Check: Unit Tests (network)
- GitHub Check: Unit Tests (network/test/cohort2)
- GitHub Check: Unit Tests (storage)
- GitHub Check: Unit Tests (engine)
- GitHub Check: Unit Tests (engine/consensus)
- GitHub Check: Unit Tests (network/test/cohort1)
- GitHub Check: Unit Tests (engine/execution/computation)
- GitHub Check: Unit Tests (module/dkg)
- GitHub Check: Unit Tests (engine/common)
- GitHub Check: Unit Tests (cmd)
- GitHub Check: Unit Tests (admin)
- GitHub Check: Unit Tests (ledger)
- GitHub Check: Unit Tests (fvm)
- GitHub Check: Unit Tests (state)
- GitHub Check: Unit Tests (engine/access)
- GitHub Check: Unit Tests (consensus)
- GitHub Check: Unit Tests (utils)
- GitHub Check: Unit Tests Insecure (insecure/integration/functional/test/gossipsub/scoring)
- GitHub Check: Unit Tests Insecure (insecure/integration/functional/test/gossipsub/rpc_inspector)
- GitHub Check: Unit Tests Insecure (insecure)
- GitHub Check: Docker Build
- GitHub Check: analyze-code (go)
🔇 Additional comments (4)
consensus/hotstuff/vote_collector.go (1)
107-116: LGTM!The BFT documentation clearly delineates responsibilities between
VoteCollector(equivocation detection, liveness) andVoteProcessor(QC construction, safety). The explicit note about theVoteProcessornot being designed to reliably detect all equivocation attempts is valuable for maintainers.consensus/hotstuff/votecollector/statemachine.go (3)
70-107: LGTM!The BFT notes comprehensively document the attack vectors and mitigations. The explanation of how
VoteCollectorandVoteProcessorwork together to provide both liveness (via fast equivocation checks) and safety (via resilient QC generation) is clear and valuable.
271-311: LGTM!The liveness proof is well-structured and correctly argues that all votes will reach the
VerifyingVoteProcessoron the happy path. The proof utilizing the Go Memory Model is thorough and the key insight aboutvotesCacheusing locks (not just atomics) being critical for correctness is valuable.
419-457: LGTM!The CAS loop implementation correctly handles the atomic state transition from
CachingVotestoVerifyingVotes. The detailed comments explaining why a full CAS loop is not necessary (since the state can never return toCachingVotesonce it leaves) demonstrate good understanding of the concurrency model.
AlexHentschel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you ✨
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
consensus/hotstuff/vote_collector.go (1)
39-41: Remove stale entry fromcollectorStatusNamesarray.The array still contains
"VoteCollectorStatusInvalid", but the corresponding enum value was removed. This creates an inconsistency between the enum definition and its string representation.🔎 Proposed fix
-var collectorStatusNames = [...]string{"VoteCollectorStatusCaching", - "VoteCollectorStatusVerifying", - "VoteCollectorStatusInvalid"} +var collectorStatusNames = [...]string{ + "VoteCollectorStatusCaching", + "VoteCollectorStatusVerifying"}consensus/hotstuff/votecollector/statemachine.go (1)
324-327: Documentation inconsistency about double proposal handling.Lines 325-327 state "In case we have received double proposal, we will stop attempting to build a QC for this view" but this contradicts both the implementation (lines 393-401 continue processing) and the detailed BFT notes (lines 343-349 explicitly state "We just continue processing votes for the first proposal received").
🔎 Proposed fix
-// ProcessBlock validates the block signature, and adds it as the proposer's vote. -// In case we have received double proposal, we will stop attempting to build a QC for this view, -// because we don't want to build on any proposal from an equivocating primary. Note: slashing challenges -// for proposal equivocation are triggered by hotstuff.Forks, so we don't have to do anything else here. +// ProcessBlock validates the block signature, and adds it as the proposer's vote. +// In case we have received a double proposal, we continue processing votes for the first valid proposal. +// Note: slashing challenges for proposal equivocation are triggered by hotstuff.Forks, so we don't have +// to do anything else here.
🧹 Nitpick comments (1)
consensus/hotstuff/votecollector/statemachine.go (1)
395-401: Consider removing unused type assertion per prior review feedback.Based on a previous review comment by AlexHentschel, the type assertion at lines 396-400 is now only serving as a narrow sanity check since
terminateVoteProcessingwas removed. The variableverifyingProcis not used after the check. While this doesn't cause any harm, removing it would simplify the code.🔎 Proposed refactor
// We already received a valid block for this view; continue collecting votes for the valid proposal we received // (even if proposer is equivocating - for reasoning, see BFT NOTES above). case hotstuff.VoteCollectorStatusVerifying: - verifyingProc, ok := proc.(hotstuff.VerifyingVoteProcessor) - if !ok { - return fmt.Errorf("while processing block %v, found that VoteProcessor reports status %s but has an incompatible implementation type %T", - proposal.Block.BlockID, proc.Status(), verifyingProc) - } + // Already in verifying state; continue processing votes for the first valid proposal default: return fmt.Errorf("while processing block %v, found that VoteProcessor reported unknown status %s", proposal.Block.BlockID, proc.Status())Based on learnings, this refactor aligns with feedback from previous reviews.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
consensus/hotstuff/vote_collector.goconsensus/hotstuff/votecollector/statemachine.go
🧰 Additional context used
📓 Path-based instructions (2)
**/*.go
📄 CodeRabbit inference engine (.cursor/rules/coding_conventions.mdc)
Follow Go coding conventions as documented in @docs/agents/CodingConventions.md
Follow Go coding standards and conventions as documented in @docs/agents/GoDocs.md
**/*.go: Follow the existing module structure in/module/,/engine/,/model/and use dependency injection patterns for component composition
Implement proper interfaces before concrete types
Follow Go naming conventions and the project's coding style defined in /docs/CodingConventions.md
Use mock generators: runmake generate-mocksafter interface changes
All inputs must be considered potentially byzantine; error classification is context-dependent and no code path is safe unless explicitly proven and documented
Use comprehensive error wrapping for debugging; avoidfmt.Errorf, useirrecoverablepackage for exceptions
NEVER log and continue on best effort basis; ALWAYS explicitly handle errors
Uses golangci-lint with custom configurations (.golangci.yml) and custom linters for Flow-specific conventions (struct write checking)
Files:
consensus/hotstuff/votecollector/statemachine.goconsensus/hotstuff/vote_collector.go
{network,engine,consensus}/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Network messages must be authenticated and validated
Files:
consensus/hotstuff/votecollector/statemachine.goconsensus/hotstuff/vote_collector.go
🧬 Code graph analysis (1)
consensus/hotstuff/votecollector/statemachine.go (1)
consensus/hotstuff/vote_collector.go (1)
VoteCollector(75-103)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (37)
- GitHub Check: Lint (./insecure/)
- GitHub Check: Lint (./integration/)
- GitHub Check: Lint (./)
- GitHub Check: Unit Tests (module)
- GitHub Check: Unit Tests (others)
- GitHub Check: Unit Tests (network/alsp)
- GitHub Check: Unit Tests (network)
- GitHub Check: Unit Tests (network/test/cohort1)
- GitHub Check: Unit Tests (network/test/cohort2)
- GitHub Check: Unit Tests (network/p2p)
- GitHub Check: Unit Tests (engine/execution/ingestion)
- GitHub Check: Unit Tests (network/p2p/node)
- GitHub Check: Unit Tests (engine)
- GitHub Check: Unit Tests (network/p2p/scoring)
- GitHub Check: Unit Tests (engine/consensus)
- GitHub Check: Unit Tests (module/dkg)
- GitHub Check: Unit Tests (engine/execution)
- GitHub Check: Unit Tests (network/p2p/connection)
- GitHub Check: Unit Tests (state)
- GitHub Check: Unit Tests (engine/common)
- GitHub Check: Unit Tests (engine/access)
- GitHub Check: Unit Tests (engine/verification)
- GitHub Check: Unit Tests (storage)
- GitHub Check: Unit Tests (cmd)
- GitHub Check: Unit Tests (ledger)
- GitHub Check: Unit Tests (engine/execution/computation)
- GitHub Check: Unit Tests (utils)
- GitHub Check: Unit Tests (engine/collection)
- GitHub Check: Unit Tests (admin)
- GitHub Check: Unit Tests (consensus)
- GitHub Check: Unit Tests (fvm)
- GitHub Check: Integration Tests Others (integration)
- GitHub Check: Unit Tests Insecure (insecure/integration/functional/test/gossipsub/scoring)
- GitHub Check: Unit Tests Insecure (insecure)
- GitHub Check: Unit Tests Insecure (insecure/integration/functional/test/gossipsub/rpc_inspector)
- GitHub Check: Docker Build
- GitHub Check: analyze-code (go)
🔇 Additional comments (5)
consensus/hotstuff/vote_collector.go (2)
50-74: Well-structured BFT documentation.The expanded documentation provides clear guidance on external requirements and BFT resilience properties. The layered defense explanation (VoteCollector for liveness, VoteProcessor for safety, Forks for proposal equivocation) is particularly helpful.
107-116: Clear documentation of VoteProcessor responsibilities and limitations.The ATTENTION section appropriately highlights that while the VoteProcessor ensures safety by constructing valid QCs, it relies on the VoteCollector for comprehensive equivocation detection. This division of responsibilities is well-explained.
consensus/hotstuff/votecollector/statemachine.go (3)
24-64: Comprehensive implementation documentation aligns with PR objectives.The documentation clearly articulates the decision to continue processing votes for the first valid proposal even when the proposer equivocates (line 58), which directly implements the PR's stated goal. The external requirements and BFT resilience properties are well-documented.
70-108: Excellent detailed BFT attack mitigation documentation.The expanded comments provide clear enumeration of attack vectors and corresponding mitigations. The explanation of how VoteCollector and VoteProcessor work together to provide both liveness and safety guarantees is particularly valuable for maintainability.
271-297: Well-structured liveness proof with clear case analysis.The formal liveness proof using the Go Memory Model's 'happens before' relation is rigorous and well-documented. The three cases (a), (b), and (c) cover all possible race conditions, and the cross-references between comments and code are accurate.
Context
In scope of previous issue we have agreed that it's ok to make progress even if byzantine leader has equivocated. Since
VoteCollectordeals only with valid blocks there is no need to have aVoteCollectorStatusInvalidas it was intended back then. Additionally, we don't terminate the vote processing and still trying to collect votes for the first proposal that we have received.Summary by CodeRabbit
Improvements
API Changes
Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.