-
Notifications
You must be signed in to change notification settings - Fork 247
feat: add manual validation to peer db as command-line option #7356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: development
Are you sure you want to change the base?
feat: add manual validation to peer db as command-line option #7356
Conversation
Added manual validation to the peer db upon base node or wallet startup. This allows a node to cleanse its peer db from invalid entries.
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis change introduces a new CLI flag Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI
participant App (Wallet/Node)
participant DHT
participant PeerManager
User->>CLI: Run with --validate-peer-db
CLI->>App (Wallet/Node): Parse CLI args
App (Wallet/Node)->>DHT: validate_all_peers()
DHT->>PeerManager: get_all_peers()
loop for each peer
DHT->>DHT: Validate peer
alt Invalid peer
DHT->>PeerManager: hard_delete_peers([peer])
end
end
DHT->>App (Wallet/Node): Return result
App (Wallet/Node)->>User: Exit or continue startup
Estimated code review effort3 (30–60 minutes) Possibly related PRs
Suggested reviewers
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
applications/minotari_console_wallet/src/cli.rs
(1 hunks)applications/minotari_console_wallet/src/lib.rs
(2 hunks)applications/minotari_node/src/cli.rs
(1 hunks)applications/minotari_node/src/lib.rs
(2 hunks)common/src/exit_codes.rs
(1 hunks)comms/core/src/peer_manager/identity_signature.rs
(3 hunks)comms/core/src/peer_manager/manager.rs
(1 hunks)comms/core/src/peer_manager/peer_identity_claim.rs
(2 hunks)comms/core/src/peer_manager/peer_storage_sql.rs
(1 hunks)comms/core/src/peer_manager/storage/database.rs
(2 hunks)comms/dht/src/dht.rs
(4 hunks)comms/dht/src/peer_validator.rs
(2 hunks)comms/dht/src/rpc/peer_info.rs
(2 hunks)integration_tests/src/wallet_process.rs
(1 hunks)
🧰 Additional context used
🧠 Learnings (15)
📓 Common learnings
Learnt from: hansieodendaal
PR: tari-project/tari#7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the `hard_delete_all_stale_peers` method in `comms/core/src/peer_manager/storage/database.rs`, the SQL query intentionally uses exact equality (`peers.features = ?`) rather than bitwise operations (`peers.features & ? != 0`) when matching `COMMUNICATION_NODE` features. This is the intended behavior to match only peers with exactly the `COMMUNICATION_NODE` feature, excluding those with additional feature flags.
Learnt from: hansieodendaal
PR: tari-project/tari#7123
File: comms/core/src/peer_manager/storage/database.rs:1655-1658
Timestamp: 2025-05-29T09:40:09.356Z
Learning: In the Tari codebase, node_id hex strings in the database are guaranteed to be valid because they can only be added via `update_peer_sql(peer: Peer)` which converts from valid NodeId objects, ensuring data integrity at the insertion layer.
Learnt from: hansieodendaal
PR: tari-project/tari#6963
File: comms/core/src/peer_manager/manager.rs:60-68
Timestamp: 2025-05-26T02:40:23.812Z
Learning: PeerDatabaseSql in the Tari codebase has been specifically refactored to handle concurrent access and mitigate blocking I/O concerns on async executor threads. The implementation has been tested under high load at both system level and through unit tests like test_concurrent_add_or_update_and_get_closest_peers which validates concurrent read/write operations.
integration_tests/src/wallet_process.rs (1)
Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers
method in comms/core/src/peer_manager/storage/database.rs
, the SQL query intentionally uses exact equality (peers.features = ?
) rather than bitwise operations (peers.features & ? != 0
) when matching COMMUNICATION_NODE
features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE
feature, excluding those with additional feature flags.
common/src/exit_codes.rs (2)
Learnt from: hansieodendaal
PR: #6963
File: common_sqlite/src/error.rs:88-92
Timestamp: 2025-05-23T07:49:57.349Z
Learning: In the StorageError enum in common_sqlite/src/error.rs, the HexError variant should keep the manual From implementation rather than using #[from] attribute, as it stores a String representation of the error rather than the HexError type itself.
Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers
method in comms/core/src/peer_manager/storage/database.rs
, the SQL query intentionally uses exact equality (peers.features = ?
) rather than bitwise operations (peers.features & ? != 0
) when matching COMMUNICATION_NODE
features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE
feature, excluding those with additional feature flags.
applications/minotari_node/src/cli.rs (1)
Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers
method in comms/core/src/peer_manager/storage/database.rs
, the SQL query intentionally uses exact equality (peers.features = ?
) rather than bitwise operations (peers.features & ? != 0
) when matching COMMUNICATION_NODE
features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE
feature, excluding those with additional feature flags.
comms/core/src/peer_manager/manager.rs (3)
Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers
method in comms/core/src/peer_manager/storage/database.rs
, the SQL query intentionally uses exact equality (peers.features = ?
) rather than bitwise operations (peers.features & ? != 0
) when matching COMMUNICATION_NODE
features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE
feature, excluding those with additional feature flags.
Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the fetch_peers_from_connection and collect_peer_stream functions rely on RPC streaming, and when the main connection is closed by another process, collect_peer_stream times out after STREAM_ITEM_TIMEOUT because it cannot detect that the peer can no longer respond, returning an empty vector of peers. This is why the connection state check is important for the retry logic.
Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1655-1658
Timestamp: 2025-05-29T09:40:09.356Z
Learning: In the Tari codebase, node_id hex strings in the database are guaranteed to be valid because they can only be added via update_peer_sql(peer: Peer)
which converts from valid NodeId objects, ensuring data integrity at the insertion layer.
comms/core/src/peer_manager/identity_signature.rs (2)
Learnt from: hansieodendaal
PR: #7216
File: base_layer/core/src/base_node/comms_interface/comms_request.rs:146-148
Timestamp: 2025-06-14T04:56:43.247Z
Learning: In the Tari codebase, HashOutput
is a type alias for FixedHash
, and FixedHash
already implements Display
trait with hex formatting by calling self.0.to_hex()
internally. This means when formatting HashOutput
values, there's no need to explicitly call .to_hex()
as the Display implementation handles hex conversion automatically.
Learnt from: hansieodendaal
PR: #7284
File: applications/minotari_console_wallet/src/automation/commands.rs:0-0
Timestamp: 2025-07-15T12:23:14.650Z
Learning: In applications/minotari_console_wallet/src/automation/commands.rs, the consistent error handling pattern for command execution is to use match statements that: 1) On success: log with debug!, print user feedback, and push tx_id to tx_ids vector for monitoring, 2) On error: print error message with eprintln! using the format "{CommandName} error! {}", rather than using .unwrap() which would panic.
comms/core/src/peer_manager/peer_identity_claim.rs (1)
Learnt from: hansieodendaal
PR: #7216
File: base_layer/core/src/base_node/comms_interface/comms_request.rs:146-148
Timestamp: 2025-06-14T04:56:43.247Z
Learning: In the Tari codebase, HashOutput
is a type alias for FixedHash
, and FixedHash
already implements Display
trait with hex formatting by calling self.0.to_hex()
internally. This means when formatting HashOutput
values, there's no need to explicitly call .to_hex()
as the Display implementation handles hex conversion automatically.
applications/minotari_console_wallet/src/cli.rs (2)
Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers
method in comms/core/src/peer_manager/storage/database.rs
, the SQL query intentionally uses exact equality (peers.features = ?
) rather than bitwise operations (peers.features & ? != 0
) when matching COMMUNICATION_NODE
features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE
feature, excluding those with additional feature flags.
Learnt from: hansieodendaal
PR: #7284
File: applications/minotari_console_wallet/src/automation/commands.rs:0-0
Timestamp: 2025-07-15T12:23:14.650Z
Learning: In applications/minotari_console_wallet/src/automation/commands.rs, the consistent error handling pattern for command execution is to use match statements that: 1) On success: log with debug!, print user feedback, and push tx_id to tx_ids vector for monitoring, 2) On error: print error message with eprintln! using the format "{CommandName} error! {}", rather than using .unwrap() which would panic.
comms/dht/src/rpc/peer_info.rs (4)
Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the NUM_RETRIES logic in get_peers is specifically designed to handle peer connections that are closed while trying to RPC stream peer info, not general connection failures. The retry logic only applies when peers.is_empty() && !conn.is_connected() && attempt < NUM_RETRIES, which indicates a mid-stream disconnection.
Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers
method in comms/core/src/peer_manager/storage/database.rs
, the SQL query intentionally uses exact equality (peers.features = ?
) rather than bitwise operations (peers.features & ? != 0
) when matching COMMUNICATION_NODE
features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE
feature, excluding those with additional feature flags.
Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the fetch_peers_from_connection and collect_peer_stream functions rely on RPC streaming, and when the main connection is closed by another process, collect_peer_stream times out after STREAM_ITEM_TIMEOUT because it cannot detect that the peer can no longer respond, returning an empty vector of peers. This is why the connection state check is important for the retry logic.
Learnt from: hansieodendaal
PR: #6963
File: comms/dht/src/proto/mod.rs:141-142
Timestamp: 2025-05-02T07:12:23.985Z
Learning: The PeerFeatures::from_bits_u32_truncate
method truncates a u32 to u8 bits but can still return None
if the resulting bits don't match any valid flags, making the error handling with .ok_or_else()
necessary even after truncation.
comms/dht/src/peer_validator.rs (5)
Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the context.connectivity.dial_peer method should fail fast and return an error if a peer cannot be dialed, rather than requiring retry logic for general connection failures.
Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the fetch_peers_from_connection and collect_peer_stream functions rely on RPC streaming, and when the main connection is closed by another process, collect_peer_stream times out after STREAM_ITEM_TIMEOUT because it cannot detect that the peer can no longer respond, returning an empty vector of peers. This is why the connection state check is important for the retry logic.
Learnt from: SWvheerden
PR: #6951
File: base_layer/core/src/base_node/tari_pulse_service/mod.rs:327-352
Timestamp: 2025-04-16T07:06:53.981Z
Learning: The discovery_peer and dial_peer methods in the Tari codebase have built-in timeout mechanisms, so adding explicit timeouts with tokio::time::timeout is unnecessary.
Learnt from: SWvheerden
PR: #6951
File: base_layer/core/src/base_node/tari_pulse_service/mod.rs:327-352
Timestamp: 2025-04-16T07:06:53.981Z
Learning: The discovery_peer and dial_peer methods in the Tari codebase have built-in timeout mechanisms, so adding explicit timeouts with tokio::time::timeout is unnecessary.
Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the NUM_RETRIES logic in get_peers is specifically designed to handle peer connections that are closed while trying to RPC stream peer info, not general connection failures. The retry logic only applies when peers.is_empty() && !conn.is_connected() && attempt < NUM_RETRIES, which indicates a mid-stream disconnection.
applications/minotari_console_wallet/src/lib.rs (6)
Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers
method in comms/core/src/peer_manager/storage/database.rs
, the SQL query intentionally uses exact equality (peers.features = ?
) rather than bitwise operations (peers.features & ? != 0
) when matching COMMUNICATION_NODE
features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE
feature, excluding those with additional feature flags.
Learnt from: hansieodendaal
PR: #7284
File: applications/minotari_console_wallet/src/automation/commands.rs:0-0
Timestamp: 2025-07-15T12:23:14.650Z
Learning: In applications/minotari_console_wallet/src/automation/commands.rs, the consistent error handling pattern for command execution is to use match statements that: 1) On success: log with debug!, print user feedback, and push tx_id to tx_ids vector for monitoring, 2) On error: print error message with eprintln! using the format "{CommandName} error! {}", rather than using .unwrap() which would panic.
Learnt from: SolfataraEmit
PR: #6994
File: docs/src/API_GRPC_Explanation.md:236-244
Timestamp: 2025-04-28T11:41:15.722Z
Learning: The minotari_console_wallet reports a fourth field timelocked_balance
in the terminal output for the getBalance
method, which should be reflected in the documentation for accuracy. The wallet.proto
file confirms that timelocked_balance
is defined as the fourth field in the GetBalanceResponse
message.
Learnt from: SolfataraEmit
PR: #6994
File: docs/src/API_GRPC_Explanation.md:236-244
Timestamp: 2025-04-28T11:41:15.722Z
Learning: The minotari_console_wallet reports a fourth field timelocked_balance
in the terminal output for the getBalance
method, which should be reflected in the documentation for accuracy.
Learnt from: SWvheerden
PR: #7301
File: clients/rust/base_node_wallet_client/src/client/http.rs:70-106
Timestamp: 2025-07-11T06:07:36.367Z
Learning: In clients/rust/base_node_wallet_client/src/client/http.rs, the HTTP client server selection logic is intentionally designed to try the local API once and then fallback to the seed server permanently without retry mechanisms. This is the preferred behavior - they want to use local API if available, otherwise use seed server, without continuously retrying the local API.
Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the context.connectivity.dial_peer method should fail fast and return an error if a peer cannot be dialed, rather than requiring retry logic for general connection failures.
applications/minotari_node/src/lib.rs (3)
Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers
method in comms/core/src/peer_manager/storage/database.rs
, the SQL query intentionally uses exact equality (peers.features = ?
) rather than bitwise operations (peers.features & ? != 0
) when matching COMMUNICATION_NODE
features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE
feature, excluding those with additional feature flags.
Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1655-1658
Timestamp: 2025-05-29T09:40:09.356Z
Learning: In the Tari codebase, node_id hex strings in the database are guaranteed to be valid because they can only be added via update_peer_sql(peer: Peer)
which converts from valid NodeId objects, ensuring data integrity at the insertion layer.
Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the context.connectivity.dial_peer method should fail fast and return an error if a peer cannot be dialed, rather than requiring retry logic for general connection failures.
comms/dht/src/dht.rs (9)
Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the context.connectivity.dial_peer method should fail fast and return an error if a peer cannot be dialed, rather than requiring retry logic for general connection failures.
Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the fetch_peers_from_connection and collect_peer_stream functions rely on RPC streaming, and when the main connection is closed by another process, collect_peer_stream times out after STREAM_ITEM_TIMEOUT because it cannot detect that the peer can no longer respond, returning an empty vector of peers. This is why the connection state check is important for the retry logic.
Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers
method in comms/core/src/peer_manager/storage/database.rs
, the SQL query intentionally uses exact equality (peers.features = ?
) rather than bitwise operations (peers.features & ? != 0
) when matching COMMUNICATION_NODE
features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE
feature, excluding those with additional feature flags.
Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the NUM_RETRIES logic in get_peers is specifically designed to handle peer connections that are closed while trying to RPC stream peer info, not general connection failures. The retry logic only applies when peers.is_empty() && !conn.is_connected() && attempt < NUM_RETRIES, which indicates a mid-stream disconnection.
Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:721-735
Timestamp: 2025-07-09T08:13:37.206Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the 10-second STREAM_ITEM_TIMEOUT and retry logic are intentionally designed to handle service conflicts where other services kill seed peer connections during seedstrap operations. The underlying discovery_peer/dial_peer API timeouts are too lenient for seedstrap use cases, so the more aggressive timeout with retry logic is appropriate and necessary.
Learnt from: SWvheerden
PR: #6951
File: base_layer/core/src/base_node/tari_pulse_service/mod.rs:327-352
Timestamp: 2025-04-16T07:06:53.981Z
Learning: The discovery_peer and dial_peer methods in the Tari codebase have built-in timeout mechanisms, so adding explicit timeouts with tokio::time::timeout is unnecessary.
Learnt from: SWvheerden
PR: #6951
File: base_layer/core/src/base_node/tari_pulse_service/mod.rs:327-352
Timestamp: 2025-04-16T07:06:53.981Z
Learning: The discovery_peer and dial_peer methods in the Tari codebase have built-in timeout mechanisms, so adding explicit timeouts with tokio::time::timeout is unnecessary.
Learnt from: hansieodendaal
PR: #7307
File: comms/core/src/connection_manager/peer_connection.rs:356-357
Timestamp: 2025-07-12T03:43:22.545Z
Learning: In comms/core/src/connectivity/manager.rs, the disconnect_if_unused call uses parameters (0, 0) intentionally to be conservative and avoid terminating connections that might be in use. This approach is used while troubleshooting inbound connection drop issues to eliminate the disconnect logic as a potential cause.
Learnt from: hansieodendaal
PR: #7307
File: comms/core/src/connection_manager/peer_connection.rs:356-357
Timestamp: 2025-07-12T03:43:22.545Z
Learning: In base_layer/core/src/base_node/tari_pulse_service/mod.rs, the disconnect_if_unused call uses parameters (0, 2) because the tari_pulse_service creates exactly 2 substreams and 0 RPC sessions during health checks.
comms/core/src/peer_manager/storage/database.rs (10)
Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers
method in comms/core/src/peer_manager/storage/database.rs
, the SQL query intentionally uses exact equality (peers.features = ?
) rather than bitwise operations (peers.features & ? != 0
) when matching COMMUNICATION_NODE
features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE
feature, excluding those with additional feature flags.
Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1655-1658
Timestamp: 2025-05-29T09:40:09.356Z
Learning: In the Tari codebase, node_id hex strings in the database are guaranteed to be valid because they can only be added via update_peer_sql(peer: Peer)
which converts from valid NodeId objects, ensuring data integrity at the insertion layer.
Learnt from: hansieodendaal
PR: #6963
File: comms/core/src/peer_manager/manager.rs:60-68
Timestamp: 2025-05-26T02:40:23.812Z
Learning: PeerDatabaseSql in the Tari codebase has been specifically refactored to handle concurrent access and mitigate blocking I/O concerns on async executor threads. The implementation has been tested under high load at both system level and through unit tests like test_concurrent_add_or_update_and_get_closest_peers which validates concurrent read/write operations.
Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the NUM_RETRIES logic in get_peers is specifically designed to handle peer connections that are closed while trying to RPC stream peer info, not general connection failures. The retry logic only applies when peers.is_empty() && !conn.is_connected() && attempt < NUM_RETRIES, which indicates a mid-stream disconnection.
Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the fetch_peers_from_connection and collect_peer_stream functions rely on RPC streaming, and when the main connection is closed by another process, collect_peer_stream times out after STREAM_ITEM_TIMEOUT because it cannot detect that the peer can no longer respond, returning an empty vector of peers. This is why the connection state check is important for the retry logic.
Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the context.connectivity.dial_peer method should fail fast and return an error if a peer cannot be dialed, rather than requiring retry logic for general connection failures.
Learnt from: SWvheerden
PR: #6951
File: base_layer/core/src/base_node/tari_pulse_service/mod.rs:327-352
Timestamp: 2025-04-16T07:06:53.981Z
Learning: The discovery_peer and dial_peer methods in the Tari codebase have built-in timeout mechanisms, so adding explicit timeouts with tokio::time::timeout is unnecessary.
Learnt from: SWvheerden
PR: #6951
File: base_layer/core/src/base_node/tari_pulse_service/mod.rs:327-352
Timestamp: 2025-04-16T07:06:53.981Z
Learning: The discovery_peer and dial_peer methods in the Tari codebase have built-in timeout mechanisms, so adding explicit timeouts with tokio::time::timeout is unnecessary.
Learnt from: hansieodendaal
PR: #6963
File: comms/core/src/peer_manager/storage/migrations/2025-04-14-072200_initial/up.sql:24-41
Timestamp: 2025-05-02T14:07:10.892Z
Learning: The peer system design requires each network address to be uniquely associated with exactly one peer, and an address cannot be reused across multiple peers.
Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:721-735
Timestamp: 2025-07-09T08:13:37.206Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the 10-second STREAM_ITEM_TIMEOUT and retry logic are intentionally designed to handle service conflicts where other services kill seed peer connections during seedstrap operations. The underlying discovery_peer/dial_peer API timeouts are too lenient for seedstrap use cases, so the more aggressive timeout with retry logic is appropriate and necessary.
comms/core/src/peer_manager/peer_storage_sql.rs (2)
Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers
method in comms/core/src/peer_manager/storage/database.rs
, the SQL query intentionally uses exact equality (peers.features = ?
) rather than bitwise operations (peers.features & ? != 0
) when matching COMMUNICATION_NODE
features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE
feature, excluding those with additional feature flags.
Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1655-1658
Timestamp: 2025-05-29T09:40:09.356Z
Learning: In the Tari codebase, node_id hex strings in the database are guaranteed to be valid because they can only be added via update_peer_sql(peer: Peer)
which converts from valid NodeId objects, ensuring data integrity at the insertion layer.
🧬 Code Graph Analysis (1)
comms/core/src/peer_manager/peer_storage_sql.rs (2)
comms/core/src/peer_manager/manager.rs (1)
hard_delete_peers
(96-99)comms/core/src/peer_manager/storage/database.rs (6)
hard_delete_peers
(763-787)node_ids
(769-769)node_ids
(1207-1207)node_ids
(1215-1215)node_ids
(1379-1379)node_ids
(1507-1507)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: test (mainnet, stagenet)
- GitHub Check: test (testnet, esmeralda)
- GitHub Check: test (nextnet, nextnet)
- GitHub Check: cargo check with stable
- GitHub Check: ledger build tests
- GitHub Check: ci
- GitHub Check: Cucumber tests / Base Layer
- GitHub Check: Cucumber tests / FFI
🔇 Additional comments (19)
integration_tests/src/wallet_process.rs (1)
238-238
: LGTM! Consistent CLI structure update.The addition of
validate_peer_db: false
maintains consistency with the updated CLI struct while appropriately defaulting to false for integration tests, ensuring existing test behavior remains unchanged.common/src/exit_codes.rs (1)
135-136
: LGTM! Well-structured exit code addition.The new
PeerDatabaseError
variant follows the established pattern with sequential numbering (126) and a clear, descriptive error message. This properly supports the peer database validation feature across both wallet and node applications.comms/core/src/peer_manager/peer_identity_claim.rs (2)
23-26
: LGTM! Clean import update.The import modification properly includes
std::fmt::Display
to support the new trait implementation.
90-98
: LGTM! Excellent debugging support addition.The Display implementation provides clear, structured output showing all relevant fields (
addresses
,features
,signature
) which will significantly improve logging and debugging capabilities for peer identity claims. The format is self-documenting with field names included.applications/minotari_console_wallet/src/cli.rs (1)
110-112
: LGTM! Well-designed CLI option addition.The
validate_peer_db
field is properly implemented with:
- Clear, descriptive documentation explaining its purpose
- Both kebab-case (
--validate-peer-db
) and snake_case (--validate_peer_db
) aliases for user convenience- Appropriate boolean type for the on/off functionality
The implementation follows established CLI patterns and clap conventions.
applications/minotari_node/src/cli.rs (1)
42-44
: LGTM! Consistent CLI implementation across applications.The
validate_peer_db
field implementation matches the wallet CLI perfectly:
- Identical documentation and functionality description
- Same clap attributes with both alias forms for user convenience
- Consistent naming and type across wallet and node applications
This consistency provides a uniform user experience and maintainable codebase.
comms/core/src/peer_manager/identity_signature.rs (1)
200-211
: LGTM! Well-structured Display implementation.The Display implementation provides clear formatting of all relevant fields including hex representation of cryptographic components, which will enhance debugging and logging for the peer validation workflow.
comms/core/src/peer_manager/manager.rs (1)
95-99
: LGTM! Clean delegation to storage layer.The method properly delegates to the underlying storage implementation and maintains consistent async interface with other PeerManager methods. The simple implementation is appropriate for this abstraction layer.
applications/minotari_console_wallet/src/lib.rs (1)
199-203
: LGTM! Proper integration of peer database validation.The validation is correctly placed after wallet initialization and before interactive operations, with appropriate error handling using the new PeerDatabaseError exit code. The synchronous execution on the runtime is suitable for this startup validation step.
comms/dht/src/rpc/peer_info.rs (1)
42-42
: LGTM! Corrected misleading parameter name.The parameter name change from
max_claims_per_claim
tomax_addresses_per_claim
correctly reflects its actual usage in limiting the number of addresses per claim, improving code readability.Also applies to: 54-54
applications/minotari_node/src/lib.rs (2)
98-98
: LGTM: Correct default value for new CLI fieldThe
validate_peer_db
field is properly initialized tofalse
, ensuring backward compatibility and opt-in behavior for the peer database validation feature.
158-163
: LGTM: Well-positioned peer validation with proper error handlingThe peer validation is strategically placed after node initialization but before starting the main services, which is ideal for database cleanup operations. The error mapping to
ExitCode::PeerDatabaseError
provides clear feedback on validation failures.comms/dht/src/dht.rs (3)
31-31
: LGTM: Appropriate imports for new peer validation functionalityThe new imports
PeerManagerError
,PeerValidator
, andUnvalidatedPeerInfo
are correctly added to support the peer database validation feature.Also applies to: 54-54, 56-56
77-78
: LGTM: Proper error type extension for peer validationAdding
PeerManagerError
toDhtInitializationError
enables proper error propagation from the peer manager layer during validation operations.
145-195
: LGTM: Comprehensive peer validation implementation with good loggingThe
validate_all_peers
method is well-implemented with several strong points:
- Comprehensive logging at trace level for individual peer details and debug level for summary information
- Smart filtering to only validate non-config sourced addresses (line 167), avoiding validation of manually configured peers
- Proper use of
UnvalidatedPeerInfo::from_peer_limited_claims
with configured limits- Batch deletion of invalid peers using
hard_delete_peers
for efficiency- Clear error handling and propagation
The implementation follows the established patterns in the Tari codebase and integrates well with the existing peer validation infrastructure.
comms/core/src/peer_manager/storage/database.rs (2)
762-787
: LGTM! Well-implemented hard deletion method.The implementation correctly handles:
- Input validation with early return for empty input
- Atomic operations using immediate transaction
- Proper foreign key constraint handling by deleting multi-addresses first
- Clear return semantics distinguishing between no input (None) and zero deletions (Some(0))
2484-2529
: LGTM! Comprehensive test coverage.The test effectively validates:
- Database setup and peer creation across different types (node/wallet, seed/regular)
- Mixed deletion scenario with peers of different types
- Correct return value (deletion count)
- Verification that deleted peers are no longer accessible via
get_all_peers
The test provides good coverage for the hard deletion functionality.
comms/dht/src/peer_validator.rs (2)
58-59
: LGTM: Documentation improvement enhances clarity.The updated docstring accurately describes the method's return behavior, improving code documentation quality.
109-137
: LGTM: Enhanced error handling with improved observability.The explicit match statement maintains the same error propagation behavior while adding valuable logging for both successful and failed claim validations. This enhanced observability will be particularly useful for the bulk peer validation feature being introduced.
The logging approach is appropriate:
- Uses trace level for detailed debugging information
- Includes relevant context (node_id, public_key, claim details)
- Distinguishes between success and failure cases clearly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is debugging code. You can keep the pr open if you want to use it for testing but I am not going to merge it
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Description
Added manual validation to the peer db upon base node or wallet startup using the
--validate-peer-db
command-line option.Motivation and Context
This allows a node to cleanse its peer db from invalid entries.
How Has This Been Tested?
Performed system-level testing.
What process can a PR reviewer use to test or verify this change?
Code review.
System-level testing.
Breaking Changes
Summary by CodeRabbit
New Features
Improvements
Bug Fixes
Tests