Skip to content

Conversation

@durkmurder
Copy link
Member

@durkmurder durkmurder commented Dec 3, 2025

Context

In scope of previous issue we have agreed that it's ok to make progress even if byzantine leader has equivocated. Since VoteCollector deals only with valid blocks there is no need to have a VoteCollectorStatusInvalid as it was intended back then. Additionally, we don't terminate the vote processing and still trying to collect votes for the first proposal that we have received.

Summary by CodeRabbit

  • Improvements

    • Vote processing now continues when block references differ, improving resilience and liveness.
  • API Changes

    • Removed an unused status value from the vote collector’s public surface.
  • Documentation

    • Expanded guidance on Byzantine fault-tolerance, attack mitigation, and operational safety for vote handling.
  • Tests

    • Simplified state-transition test to validate the initial verification transition only.

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2025

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

case hotstuff.VoteCollectorStatusVerifying:
verifyingProc, ok := proc.(hotstuff.VerifyingVoteProcessor)
if !ok {
return fmt.Errorf("while processing block %v, found that VoteProcessor reports status %s but has an incompatible implementation type %T",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the following lines still needed?

verifyingProc, ok := proc.(hotstuff.VerifyingVoteProcessor)
if !ok {
return fmt.Errorf("while processing block %v, found that VoteProcessor reports status %s but has an incompatible implementation type %T",
proposal.Block.BlockID, proc.Status(), verifyingProc)
}

Previously, we did the cast, because we wanted to call terminateVoteProcessing. Since that's no longer happening, the cast here is essentially only a sanity check.

For simplicity, I would be inclined to remove the code, because this sanity check just covers a very narrow range of bugs, for which we typically don't include similar sanity checks. Furthermore, in the specific case here, there is nothing that would go catastrophically wrong immediately if the didn't detect the wrong type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the following no longer holds:

// * In the current implementation, the `votesCache` does not catch leader attacks 1. and 2., which exploit the fact that stand-alone
// votes and votes embedded in proposals are processed concurrently through different code paths. We emphasize that attack vectors
// 1. and 2. are only available to a byzantine leader as long as the leader has not (yet) equivocated for the current view. Once
// the VoteCollector notices that the _leader_ equivocates, it immediately stops accepting proposals for the view, thereby closing
// attack vectors 1. and 2. Hence, it is sufficient for the [VerifyingVoteProcessor] to catch _all_ equivocation attacks,
// including attacks 1. and 2.

Specifically:

it immediately stops accepting proposals for the view

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to address this with my commit ... turned into bit of a broader breakdown. Please take a look when you have time 🎄

… including detection and reporting of attacks
@coderabbitai
Copy link

coderabbitai bot commented Dec 23, 2025

Walkthrough

Removed the VoteCollectorStatusInvalid state and the terminate-vote-processing path; expanded and relocated BFT-focused documentation for VoteCollector and VoteProcessor, and simplified state-machine tests to no longer assert an Invalid transition.

Changes

Cohort / File(s) Summary
Enum and API docs
consensus/hotstuff/vote_collector.go
Removed the exported constant VoteCollectorStatusInvalid. Expanded and relocated detailed BFT-focused documentation for VoteCollector and VoteProcessor (comments only; no signature changes).
State machine logic & comments
consensus/hotstuff/votecollector/statemachine.go
Removed terminateVoteProcessing and the CAS-based termination path; eliminated block-divergence termination check; constrained documented valid transition to CachingVotes → VerifyingVotes; extensive comment updates describing concurrent ingestion, equivocation handling, and responsibilities between the collector and verifying processor.
Tests
consensus/hotstuff/votecollector/statemachine_test.go
Simplified TestStatus_StateTransitions: removed the second transition/assertion that expected an Invalid state and now asserts only the first transition to Verifying.

Sequence Diagram(s)

mermaid
sequenceDiagram
autonumber
participant Proposer
participant VoteCollector
participant VerifyingProcessor
participant Notifier

Note over VoteCollector,VerifyingProcessor: New flow: collector continues accepting votes\nand forwards proposals to verifier; no terminate path

Proposer->>VoteCollector: submit proposal (in-proposal votes / stand-alone votes)
VoteCollector->>VoteCollector: cache votes (CachingVotes)
VoteCollector->>VerifyingProcessor: forward proposal for verification (transition → VerifyingVotes)
VerifyingProcessor->>VoteCollector: request additional data / verification result
VerifyingProcessor->>Notifier: report equivocation or verification outcome
Note right of Notifier: Notifier receives reports; slashing/cancellation handled externally

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly references the main change: removal of VoteCollectorStatusInvalid from the state machine, which is a concrete and specific action clearly reflected in the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch yurii/6127-remove-termination-of-vote-processing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
consensus/hotstuff/vote_collector.go (1)

39-48: Stale status name in array and off-by-one error in bounds check.

Two issues in the String() method:

  1. collectorStatusNames still contains "VoteCollectorStatusInvalid" on line 41, but this status was removed from the enum. This orphaned element should be removed.

  2. The bounds check on line 44 uses > but should use >=. With the current array of 3 elements, len(collectorStatusNames) is 3, so a status value of 3 would pass the check but cause an out-of-bounds access.

🔎 Proposed fix
 var collectorStatusNames = [...]string{"VoteCollectorStatusCaching",
-	"VoteCollectorStatusVerifying",
-	"VoteCollectorStatusInvalid"}
+	"VoteCollectorStatusVerifying"}

 func (ps VoteCollectorStatus) String() string {
-	if ps < 0 || int(ps) > len(collectorStatusNames) {
+	if ps < 0 || int(ps) >= len(collectorStatusNames) {
 		return "UNKNOWN"
 	}
 	return collectorStatusNames[ps]
 }
♻️ Duplicate comments (1)
consensus/hotstuff/votecollector/statemachine.go (1)

395-404: Consider removing the now-unnecessary type assertion.

As noted in a prior review comment, the type assertion to hotstuff.VerifyingVoteProcessor (lines 396-400) was previously needed to call terminateVoteProcessing. With that logic removed, this cast only serves as a sanity check that doesn't prevent any immediate issue if it fails (the code just returns anyway).

For simplicity, consider removing this check or, if you want to keep it as a defensive measure, at least use the blank identifier since verifyingProc is unused:

🔎 Option 1: Remove the type assertion entirely
 		// We already received a valid block for this view; continue collecting votes for the valid proposal we received
 		// (even if proposer is equivocating - for reasoning, see BFT NOTES above).
 		case hotstuff.VoteCollectorStatusVerifying:
-			verifyingProc, ok := proc.(hotstuff.VerifyingVoteProcessor)
-			if !ok {
-				return fmt.Errorf("while processing block %v, found that VoteProcessor reports status %s but has an incompatible implementation type %T",
-					proposal.Block.BlockID, proc.Status(), verifyingProc)
-			}
+			// Already processing a valid proposal for this view; ignore subsequent proposals.
 
 		default:
🔎 Option 2: Keep sanity check but use blank identifier
-			verifyingProc, ok := proc.(hotstuff.VerifyingVoteProcessor)
+			_, ok := proc.(hotstuff.VerifyingVoteProcessor)
 			if !ok {
 				return fmt.Errorf("while processing block %v, found that VoteProcessor reports status %s but has an incompatible implementation type %T",
-					proposal.Block.BlockID, proc.Status(), verifyingProc)
+					proposal.Block.BlockID, proc.Status(), proc)
 			}
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae3e371 and 41d1169.

📒 Files selected for processing (3)
  • consensus/hotstuff/vote_collector.go
  • consensus/hotstuff/votecollector/statemachine.go
  • consensus/hotstuff/votecollector/statemachine_test.go
💤 Files with no reviewable changes (1)
  • consensus/hotstuff/votecollector/statemachine_test.go
🧰 Additional context used
📓 Path-based instructions (2)
**/*.go

📄 CodeRabbit inference engine (.cursor/rules/coding_conventions.mdc)

Follow Go coding conventions as documented in @docs/agents/CodingConventions.md

Follow Go coding standards and conventions as documented in @docs/agents/GoDocs.md

**/*.go: Follow the existing module structure in /module/, /engine/, /model/ and use dependency injection patterns for component composition
Implement proper interfaces before concrete types
Follow Go naming conventions and the project's coding style defined in /docs/CodingConventions.md
Use mock generators: run make generate-mocks after interface changes
All inputs must be considered potentially byzantine; error classification is context-dependent and no code path is safe unless explicitly proven and documented
Use comprehensive error wrapping for debugging; avoid fmt.Errorf, use irrecoverable package for exceptions
NEVER log and continue on best effort basis; ALWAYS explicitly handle errors
Uses golangci-lint with custom configurations (.golangci.yml) and custom linters for Flow-specific conventions (struct write checking)

Files:

  • consensus/hotstuff/votecollector/statemachine.go
  • consensus/hotstuff/vote_collector.go
{network,engine,consensus}/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Network messages must be authenticated and validated

Files:

  • consensus/hotstuff/votecollector/statemachine.go
  • consensus/hotstuff/vote_collector.go
🧠 Learnings (1)
📚 Learning: 2025-12-23T00:28:40.994Z
Learnt from: CR
Repo: onflow/flow-go PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-23T00:28:40.994Z
Learning: Applies to {storage,ledger,execution,fvm}/**/*.go : State consistency is paramount; use proper synchronization primitives

Applied to files:

  • consensus/hotstuff/votecollector/statemachine.go
🧬 Code graph analysis (1)
consensus/hotstuff/votecollector/statemachine.go (1)
consensus/hotstuff/vote_collector.go (1)
  • VoteCollector (75-103)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (37)
  • GitHub Check: Lint (./integration/)
  • GitHub Check: Lint (./insecure/)
  • GitHub Check: Lint (./)
  • GitHub Check: Integration Tests Others (integration)
  • GitHub Check: Unit Tests (network/p2p/connection)
  • GitHub Check: Unit Tests (others)
  • GitHub Check: Unit Tests (engine/verification)
  • GitHub Check: Unit Tests (engine/execution/ingestion)
  • GitHub Check: Unit Tests (engine/execution)
  • GitHub Check: Unit Tests (engine/collection)
  • GitHub Check: Unit Tests (module)
  • GitHub Check: Unit Tests (network/p2p/node)
  • GitHub Check: Unit Tests (network/p2p)
  • GitHub Check: Unit Tests (network/alsp)
  • GitHub Check: Unit Tests (network/p2p/scoring)
  • GitHub Check: Unit Tests (network)
  • GitHub Check: Unit Tests (network/test/cohort2)
  • GitHub Check: Unit Tests (storage)
  • GitHub Check: Unit Tests (engine)
  • GitHub Check: Unit Tests (engine/consensus)
  • GitHub Check: Unit Tests (network/test/cohort1)
  • GitHub Check: Unit Tests (engine/execution/computation)
  • GitHub Check: Unit Tests (module/dkg)
  • GitHub Check: Unit Tests (engine/common)
  • GitHub Check: Unit Tests (cmd)
  • GitHub Check: Unit Tests (admin)
  • GitHub Check: Unit Tests (ledger)
  • GitHub Check: Unit Tests (fvm)
  • GitHub Check: Unit Tests (state)
  • GitHub Check: Unit Tests (engine/access)
  • GitHub Check: Unit Tests (consensus)
  • GitHub Check: Unit Tests (utils)
  • GitHub Check: Unit Tests Insecure (insecure/integration/functional/test/gossipsub/scoring)
  • GitHub Check: Unit Tests Insecure (insecure/integration/functional/test/gossipsub/rpc_inspector)
  • GitHub Check: Unit Tests Insecure (insecure)
  • GitHub Check: Docker Build
  • GitHub Check: analyze-code (go)
🔇 Additional comments (4)
consensus/hotstuff/vote_collector.go (1)

107-116: LGTM!

The BFT documentation clearly delineates responsibilities between VoteCollector (equivocation detection, liveness) and VoteProcessor (QC construction, safety). The explicit note about the VoteProcessor not being designed to reliably detect all equivocation attempts is valuable for maintainers.

consensus/hotstuff/votecollector/statemachine.go (3)

70-107: LGTM!

The BFT notes comprehensively document the attack vectors and mitigations. The explanation of how VoteCollector and VoteProcessor work together to provide both liveness (via fast equivocation checks) and safety (via resilient QC generation) is clear and valuable.


271-311: LGTM!

The liveness proof is well-structured and correctly argues that all votes will reach the VerifyingVoteProcessor on the happy path. The proof utilizing the Go Memory Model is thorough and the key insight about votesCache using locks (not just atomics) being critical for correctness is valuable.


419-457: LGTM!

The CAS loop implementation correctly handles the atomic state transition from CachingVotes to VerifyingVotes. The detailed comments explaining why a full CAS loop is not necessary (since the state can never return to CachingVotes once it leaves) demonstrate good understanding of the concurrency model.

Copy link
Member

@AlexHentschel AlexHentschel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you ✨

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
consensus/hotstuff/vote_collector.go (1)

39-41: Remove stale entry from collectorStatusNames array.

The array still contains "VoteCollectorStatusInvalid", but the corresponding enum value was removed. This creates an inconsistency between the enum definition and its string representation.

🔎 Proposed fix
-var collectorStatusNames = [...]string{"VoteCollectorStatusCaching",
-	"VoteCollectorStatusVerifying",
-	"VoteCollectorStatusInvalid"}
+var collectorStatusNames = [...]string{
+	"VoteCollectorStatusCaching",
+	"VoteCollectorStatusVerifying"}
consensus/hotstuff/votecollector/statemachine.go (1)

324-327: Documentation inconsistency about double proposal handling.

Lines 325-327 state "In case we have received double proposal, we will stop attempting to build a QC for this view" but this contradicts both the implementation (lines 393-401 continue processing) and the detailed BFT notes (lines 343-349 explicitly state "We just continue processing votes for the first proposal received").

🔎 Proposed fix
-// ProcessBlock validates the block signature, and adds it as the proposer's vote.
-// In case we have received double proposal, we will stop attempting to build a QC for this view,
-// because we don't want to build on any proposal from an equivocating primary. Note: slashing challenges
-// for proposal equivocation are triggered by hotstuff.Forks, so we don't have to do anything else here.
+// ProcessBlock validates the block signature, and adds it as the proposer's vote.
+// In case we have received a double proposal, we continue processing votes for the first valid proposal.
+// Note: slashing challenges for proposal equivocation are triggered by hotstuff.Forks, so we don't have
+// to do anything else here.
🧹 Nitpick comments (1)
consensus/hotstuff/votecollector/statemachine.go (1)

395-401: Consider removing unused type assertion per prior review feedback.

Based on a previous review comment by AlexHentschel, the type assertion at lines 396-400 is now only serving as a narrow sanity check since terminateVoteProcessing was removed. The variable verifyingProc is not used after the check. While this doesn't cause any harm, removing it would simplify the code.

🔎 Proposed refactor
 		// We already received a valid block for this view; continue collecting votes for the valid proposal we received
 		// (even if proposer is equivocating - for reasoning, see BFT NOTES above).
 		case hotstuff.VoteCollectorStatusVerifying:
-			verifyingProc, ok := proc.(hotstuff.VerifyingVoteProcessor)
-			if !ok {
-				return fmt.Errorf("while processing block %v, found that VoteProcessor reports status %s but has an incompatible implementation type %T",
-					proposal.Block.BlockID, proc.Status(), verifyingProc)
-			}
+			// Already in verifying state; continue processing votes for the first valid proposal
 
 		default:
 			return fmt.Errorf("while processing block %v, found that VoteProcessor reported unknown status %s", proposal.Block.BlockID, proc.Status())

Based on learnings, this refactor aligns with feedback from previous reviews.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 41d1169 and ab14a1d.

📒 Files selected for processing (2)
  • consensus/hotstuff/vote_collector.go
  • consensus/hotstuff/votecollector/statemachine.go
🧰 Additional context used
📓 Path-based instructions (2)
**/*.go

📄 CodeRabbit inference engine (.cursor/rules/coding_conventions.mdc)

Follow Go coding conventions as documented in @docs/agents/CodingConventions.md

Follow Go coding standards and conventions as documented in @docs/agents/GoDocs.md

**/*.go: Follow the existing module structure in /module/, /engine/, /model/ and use dependency injection patterns for component composition
Implement proper interfaces before concrete types
Follow Go naming conventions and the project's coding style defined in /docs/CodingConventions.md
Use mock generators: run make generate-mocks after interface changes
All inputs must be considered potentially byzantine; error classification is context-dependent and no code path is safe unless explicitly proven and documented
Use comprehensive error wrapping for debugging; avoid fmt.Errorf, use irrecoverable package for exceptions
NEVER log and continue on best effort basis; ALWAYS explicitly handle errors
Uses golangci-lint with custom configurations (.golangci.yml) and custom linters for Flow-specific conventions (struct write checking)

Files:

  • consensus/hotstuff/votecollector/statemachine.go
  • consensus/hotstuff/vote_collector.go
{network,engine,consensus}/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Network messages must be authenticated and validated

Files:

  • consensus/hotstuff/votecollector/statemachine.go
  • consensus/hotstuff/vote_collector.go
🧬 Code graph analysis (1)
consensus/hotstuff/votecollector/statemachine.go (1)
consensus/hotstuff/vote_collector.go (1)
  • VoteCollector (75-103)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (37)
  • GitHub Check: Lint (./insecure/)
  • GitHub Check: Lint (./integration/)
  • GitHub Check: Lint (./)
  • GitHub Check: Unit Tests (module)
  • GitHub Check: Unit Tests (others)
  • GitHub Check: Unit Tests (network/alsp)
  • GitHub Check: Unit Tests (network)
  • GitHub Check: Unit Tests (network/test/cohort1)
  • GitHub Check: Unit Tests (network/test/cohort2)
  • GitHub Check: Unit Tests (network/p2p)
  • GitHub Check: Unit Tests (engine/execution/ingestion)
  • GitHub Check: Unit Tests (network/p2p/node)
  • GitHub Check: Unit Tests (engine)
  • GitHub Check: Unit Tests (network/p2p/scoring)
  • GitHub Check: Unit Tests (engine/consensus)
  • GitHub Check: Unit Tests (module/dkg)
  • GitHub Check: Unit Tests (engine/execution)
  • GitHub Check: Unit Tests (network/p2p/connection)
  • GitHub Check: Unit Tests (state)
  • GitHub Check: Unit Tests (engine/common)
  • GitHub Check: Unit Tests (engine/access)
  • GitHub Check: Unit Tests (engine/verification)
  • GitHub Check: Unit Tests (storage)
  • GitHub Check: Unit Tests (cmd)
  • GitHub Check: Unit Tests (ledger)
  • GitHub Check: Unit Tests (engine/execution/computation)
  • GitHub Check: Unit Tests (utils)
  • GitHub Check: Unit Tests (engine/collection)
  • GitHub Check: Unit Tests (admin)
  • GitHub Check: Unit Tests (consensus)
  • GitHub Check: Unit Tests (fvm)
  • GitHub Check: Integration Tests Others (integration)
  • GitHub Check: Unit Tests Insecure (insecure/integration/functional/test/gossipsub/scoring)
  • GitHub Check: Unit Tests Insecure (insecure)
  • GitHub Check: Unit Tests Insecure (insecure/integration/functional/test/gossipsub/rpc_inspector)
  • GitHub Check: Docker Build
  • GitHub Check: analyze-code (go)
🔇 Additional comments (5)
consensus/hotstuff/vote_collector.go (2)

50-74: Well-structured BFT documentation.

The expanded documentation provides clear guidance on external requirements and BFT resilience properties. The layered defense explanation (VoteCollector for liveness, VoteProcessor for safety, Forks for proposal equivocation) is particularly helpful.


107-116: Clear documentation of VoteProcessor responsibilities and limitations.

The ATTENTION section appropriately highlights that while the VoteProcessor ensures safety by constructing valid QCs, it relies on the VoteCollector for comprehensive equivocation detection. This division of responsibilities is well-explained.

consensus/hotstuff/votecollector/statemachine.go (3)

24-64: Comprehensive implementation documentation aligns with PR objectives.

The documentation clearly articulates the decision to continue processing votes for the first valid proposal even when the proposer equivocates (line 58), which directly implements the PR's stated goal. The external requirements and BFT resilience properties are well-documented.


70-108: Excellent detailed BFT attack mitigation documentation.

The expanded comments provide clear enumeration of attack vectors and corresponding mitigations. The explanation of how VoteCollector and VoteProcessor work together to provide both liveness and safety guarantees is particularly valuable for maintainability.


271-297: Well-structured liveness proof with clear case analysis.

The formal liveness proof using the Go Memory Model's 'happens before' relation is rigorous and well-documented. The three cases (a), (b), and (c) cover all possible race conditions, and the cross-references between comments and code are accurate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants