Skip to content

feat: add manual validation to peer db as command-line option #7356

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: development
Choose a base branch
from

Conversation

hansieodendaal
Copy link
Contributor

@hansieodendaal hansieodendaal commented Jul 21, 2025

Description

Added manual validation to the peer db upon base node or wallet startup using the --validate-peer-db command-line option.

Motivation and Context

This allows a node to cleanse its peer db from invalid entries.

How Has This Been Tested?

Performed system-level testing.

What process can a PR reviewer use to test or verify this change?

Code review.
System-level testing.

Breaking Changes

  • None
  • Requires data directory on base node to be deleted
  • Requires hard fork
  • Other - Please specify

Summary by CodeRabbit

  • New Features

    • Added a command-line option to validate and clean the peer database in both the console wallet and node applications.
    • Peer database validation removes invalid peer entries before the application starts.
  • Improvements

    • Enhanced error reporting with a new exit code for peer database errors.
    • Added user-friendly string representations for peer identity claims and signatures.
  • Bug Fixes

    • Corrected parameter naming in peer information handling for improved clarity.
  • Tests

    • Added tests to ensure permanent deletion of peers from the database works as expected.

Added manual validation to the peer db upon base node or wallet startup. This allows a node to
cleanse its peer db from invalid entries.
Copy link
Contributor

coderabbitai bot commented Jul 21, 2025

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This change introduces a new CLI flag --validate-peer-db to both the console wallet and base node applications, enabling validation and cleanup of invalid entries in the peer database. Supporting methods for hard deletion of peers are added at multiple layers of the peer management stack, along with new error codes and logging. Display trait implementations are also introduced for peer identity types.

Changes

File(s) Change Summary
applications/minotari_console_wallet/src/cli.rs,
applications/minotari_node/src/cli.rs
Added validate_peer_db boolean CLI flag to Cli struct, exposed as --validate-peer-db.
applications/minotari_console_wallet/src/lib.rs,
applications/minotari_node/src/lib.rs
Integrated validate_peer_db flag into CLI parsing and execution flow; triggers peer DB validation before startup.
comms/core/src/peer_manager/manager.rs,
comms/core/src/peer_manager/peer_storage_sql.rs,
comms/core/src/peer_manager/storage/database.rs
Added hard_delete_peers method at manager, storage, and database layers for permanent peer removal.
comms/dht/src/dht.rs Added validate_all_peers async method to DHT for validating and removing invalid peers.
comms/dht/src/peer_validator.rs Improved peer validation error handling and logging in PeerValidator.
comms/dht/src/rpc/peer_info.rs Corrected parameter naming for address limits in UnvalidatedPeerInfo.
common/src/exit_codes.rs Added PeerDatabaseError exit code (126) for peer DB validation failures.
comms/core/src/peer_manager/identity_signature.rs,
comms/core/src/peer_manager/peer_identity_claim.rs
Implemented Display trait for IdentitySignature and PeerIdentityClaim for better logging.
integration_tests/src/wallet_process.rs Added validate_peer_db field to test CLI defaults.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI
    participant App (Wallet/Node)
    participant DHT
    participant PeerManager

    User->>CLI: Run with --validate-peer-db
    CLI->>App (Wallet/Node): Parse CLI args
    App (Wallet/Node)->>DHT: validate_all_peers()
    DHT->>PeerManager: get_all_peers()
    loop for each peer
        DHT->>DHT: Validate peer
        alt Invalid peer
            DHT->>PeerManager: hard_delete_peers([peer])
        end
    end
    DHT->>App (Wallet/Node): Return result
    App (Wallet/Node)->>User: Exit or continue startup
Loading

Estimated code review effort

3 (30–60 minutes)

Possibly related PRs

  • feat: add sqlite peer_db #6963: Introduces the SQLite peer database with the validate_all_peers method, directly supporting the new validation flow added in this PR.

Suggested reviewers

  • SWvheerden

Poem

A hop and a skip through the peer DB,
With a flag on the CLI, as clear as can be.
Invalid bunnies are swept from the warren,
Logs now sparkle, no need for sorrowin'.
🐇✨
Code cleaned and neat, the network feels right,
Peer validation brings bugs to the light!


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@hansieodendaal hansieodendaal changed the title feat; add manual validation to peer db as command-line option feat: add manual validation to peer db as command-line option Jul 21, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9a16964 and 7f637a7.

📒 Files selected for processing (14)
  • applications/minotari_console_wallet/src/cli.rs (1 hunks)
  • applications/minotari_console_wallet/src/lib.rs (2 hunks)
  • applications/minotari_node/src/cli.rs (1 hunks)
  • applications/minotari_node/src/lib.rs (2 hunks)
  • common/src/exit_codes.rs (1 hunks)
  • comms/core/src/peer_manager/identity_signature.rs (3 hunks)
  • comms/core/src/peer_manager/manager.rs (1 hunks)
  • comms/core/src/peer_manager/peer_identity_claim.rs (2 hunks)
  • comms/core/src/peer_manager/peer_storage_sql.rs (1 hunks)
  • comms/core/src/peer_manager/storage/database.rs (2 hunks)
  • comms/dht/src/dht.rs (4 hunks)
  • comms/dht/src/peer_validator.rs (2 hunks)
  • comms/dht/src/rpc/peer_info.rs (2 hunks)
  • integration_tests/src/wallet_process.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (15)
📓 Common learnings
Learnt from: hansieodendaal
PR: tari-project/tari#7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the `hard_delete_all_stale_peers` method in `comms/core/src/peer_manager/storage/database.rs`, the SQL query intentionally uses exact equality (`peers.features = ?`) rather than bitwise operations (`peers.features & ? != 0`) when matching `COMMUNICATION_NODE` features. This is the intended behavior to match only peers with exactly the `COMMUNICATION_NODE` feature, excluding those with additional feature flags.
Learnt from: hansieodendaal
PR: tari-project/tari#7123
File: comms/core/src/peer_manager/storage/database.rs:1655-1658
Timestamp: 2025-05-29T09:40:09.356Z
Learning: In the Tari codebase, node_id hex strings in the database are guaranteed to be valid because they can only be added via `update_peer_sql(peer: Peer)` which converts from valid NodeId objects, ensuring data integrity at the insertion layer.
Learnt from: hansieodendaal
PR: tari-project/tari#6963
File: comms/core/src/peer_manager/manager.rs:60-68
Timestamp: 2025-05-26T02:40:23.812Z
Learning: PeerDatabaseSql in the Tari codebase has been specifically refactored to handle concurrent access and mitigate blocking I/O concerns on async executor threads. The implementation has been tested under high load at both system level and through unit tests like test_concurrent_add_or_update_and_get_closest_peers which validates concurrent read/write operations.
integration_tests/src/wallet_process.rs (1)

Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers method in comms/core/src/peer_manager/storage/database.rs, the SQL query intentionally uses exact equality (peers.features = ?) rather than bitwise operations (peers.features & ? != 0) when matching COMMUNICATION_NODE features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE feature, excluding those with additional feature flags.

common/src/exit_codes.rs (2)

Learnt from: hansieodendaal
PR: #6963
File: common_sqlite/src/error.rs:88-92
Timestamp: 2025-05-23T07:49:57.349Z
Learning: In the StorageError enum in common_sqlite/src/error.rs, the HexError variant should keep the manual From implementation rather than using #[from] attribute, as it stores a String representation of the error rather than the HexError type itself.

Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers method in comms/core/src/peer_manager/storage/database.rs, the SQL query intentionally uses exact equality (peers.features = ?) rather than bitwise operations (peers.features & ? != 0) when matching COMMUNICATION_NODE features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE feature, excluding those with additional feature flags.

applications/minotari_node/src/cli.rs (1)

Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers method in comms/core/src/peer_manager/storage/database.rs, the SQL query intentionally uses exact equality (peers.features = ?) rather than bitwise operations (peers.features & ? != 0) when matching COMMUNICATION_NODE features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE feature, excluding those with additional feature flags.

comms/core/src/peer_manager/manager.rs (3)

Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers method in comms/core/src/peer_manager/storage/database.rs, the SQL query intentionally uses exact equality (peers.features = ?) rather than bitwise operations (peers.features & ? != 0) when matching COMMUNICATION_NODE features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE feature, excluding those with additional feature flags.

Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the fetch_peers_from_connection and collect_peer_stream functions rely on RPC streaming, and when the main connection is closed by another process, collect_peer_stream times out after STREAM_ITEM_TIMEOUT because it cannot detect that the peer can no longer respond, returning an empty vector of peers. This is why the connection state check is important for the retry logic.

Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1655-1658
Timestamp: 2025-05-29T09:40:09.356Z
Learning: In the Tari codebase, node_id hex strings in the database are guaranteed to be valid because they can only be added via update_peer_sql(peer: Peer) which converts from valid NodeId objects, ensuring data integrity at the insertion layer.

comms/core/src/peer_manager/identity_signature.rs (2)

Learnt from: hansieodendaal
PR: #7216
File: base_layer/core/src/base_node/comms_interface/comms_request.rs:146-148
Timestamp: 2025-06-14T04:56:43.247Z
Learning: In the Tari codebase, HashOutput is a type alias for FixedHash, and FixedHash already implements Display trait with hex formatting by calling self.0.to_hex() internally. This means when formatting HashOutput values, there's no need to explicitly call .to_hex() as the Display implementation handles hex conversion automatically.

Learnt from: hansieodendaal
PR: #7284
File: applications/minotari_console_wallet/src/automation/commands.rs:0-0
Timestamp: 2025-07-15T12:23:14.650Z
Learning: In applications/minotari_console_wallet/src/automation/commands.rs, the consistent error handling pattern for command execution is to use match statements that: 1) On success: log with debug!, print user feedback, and push tx_id to tx_ids vector for monitoring, 2) On error: print error message with eprintln! using the format "{CommandName} error! {}", rather than using .unwrap() which would panic.

comms/core/src/peer_manager/peer_identity_claim.rs (1)

Learnt from: hansieodendaal
PR: #7216
File: base_layer/core/src/base_node/comms_interface/comms_request.rs:146-148
Timestamp: 2025-06-14T04:56:43.247Z
Learning: In the Tari codebase, HashOutput is a type alias for FixedHash, and FixedHash already implements Display trait with hex formatting by calling self.0.to_hex() internally. This means when formatting HashOutput values, there's no need to explicitly call .to_hex() as the Display implementation handles hex conversion automatically.

applications/minotari_console_wallet/src/cli.rs (2)

Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers method in comms/core/src/peer_manager/storage/database.rs, the SQL query intentionally uses exact equality (peers.features = ?) rather than bitwise operations (peers.features & ? != 0) when matching COMMUNICATION_NODE features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE feature, excluding those with additional feature flags.

Learnt from: hansieodendaal
PR: #7284
File: applications/minotari_console_wallet/src/automation/commands.rs:0-0
Timestamp: 2025-07-15T12:23:14.650Z
Learning: In applications/minotari_console_wallet/src/automation/commands.rs, the consistent error handling pattern for command execution is to use match statements that: 1) On success: log with debug!, print user feedback, and push tx_id to tx_ids vector for monitoring, 2) On error: print error message with eprintln! using the format "{CommandName} error! {}", rather than using .unwrap() which would panic.

comms/dht/src/rpc/peer_info.rs (4)

Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the NUM_RETRIES logic in get_peers is specifically designed to handle peer connections that are closed while trying to RPC stream peer info, not general connection failures. The retry logic only applies when peers.is_empty() && !conn.is_connected() && attempt < NUM_RETRIES, which indicates a mid-stream disconnection.

Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers method in comms/core/src/peer_manager/storage/database.rs, the SQL query intentionally uses exact equality (peers.features = ?) rather than bitwise operations (peers.features & ? != 0) when matching COMMUNICATION_NODE features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE feature, excluding those with additional feature flags.

Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the fetch_peers_from_connection and collect_peer_stream functions rely on RPC streaming, and when the main connection is closed by another process, collect_peer_stream times out after STREAM_ITEM_TIMEOUT because it cannot detect that the peer can no longer respond, returning an empty vector of peers. This is why the connection state check is important for the retry logic.

Learnt from: hansieodendaal
PR: #6963
File: comms/dht/src/proto/mod.rs:141-142
Timestamp: 2025-05-02T07:12:23.985Z
Learning: The PeerFeatures::from_bits_u32_truncate method truncates a u32 to u8 bits but can still return None if the resulting bits don't match any valid flags, making the error handling with .ok_or_else() necessary even after truncation.

comms/dht/src/peer_validator.rs (5)

Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the context.connectivity.dial_peer method should fail fast and return an error if a peer cannot be dialed, rather than requiring retry logic for general connection failures.

Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the fetch_peers_from_connection and collect_peer_stream functions rely on RPC streaming, and when the main connection is closed by another process, collect_peer_stream times out after STREAM_ITEM_TIMEOUT because it cannot detect that the peer can no longer respond, returning an empty vector of peers. This is why the connection state check is important for the retry logic.

Learnt from: SWvheerden
PR: #6951
File: base_layer/core/src/base_node/tari_pulse_service/mod.rs:327-352
Timestamp: 2025-04-16T07:06:53.981Z
Learning: The discovery_peer and dial_peer methods in the Tari codebase have built-in timeout mechanisms, so adding explicit timeouts with tokio::time::timeout is unnecessary.

Learnt from: SWvheerden
PR: #6951
File: base_layer/core/src/base_node/tari_pulse_service/mod.rs:327-352
Timestamp: 2025-04-16T07:06:53.981Z
Learning: The discovery_peer and dial_peer methods in the Tari codebase have built-in timeout mechanisms, so adding explicit timeouts with tokio::time::timeout is unnecessary.

Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the NUM_RETRIES logic in get_peers is specifically designed to handle peer connections that are closed while trying to RPC stream peer info, not general connection failures. The retry logic only applies when peers.is_empty() && !conn.is_connected() && attempt < NUM_RETRIES, which indicates a mid-stream disconnection.

applications/minotari_console_wallet/src/lib.rs (6)

Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers method in comms/core/src/peer_manager/storage/database.rs, the SQL query intentionally uses exact equality (peers.features = ?) rather than bitwise operations (peers.features & ? != 0) when matching COMMUNICATION_NODE features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE feature, excluding those with additional feature flags.

Learnt from: hansieodendaal
PR: #7284
File: applications/minotari_console_wallet/src/automation/commands.rs:0-0
Timestamp: 2025-07-15T12:23:14.650Z
Learning: In applications/minotari_console_wallet/src/automation/commands.rs, the consistent error handling pattern for command execution is to use match statements that: 1) On success: log with debug!, print user feedback, and push tx_id to tx_ids vector for monitoring, 2) On error: print error message with eprintln! using the format "{CommandName} error! {}", rather than using .unwrap() which would panic.

Learnt from: SolfataraEmit
PR: #6994
File: docs/src/API_GRPC_Explanation.md:236-244
Timestamp: 2025-04-28T11:41:15.722Z
Learning: The minotari_console_wallet reports a fourth field timelocked_balance in the terminal output for the getBalance method, which should be reflected in the documentation for accuracy. The wallet.proto file confirms that timelocked_balance is defined as the fourth field in the GetBalanceResponse message.

Learnt from: SolfataraEmit
PR: #6994
File: docs/src/API_GRPC_Explanation.md:236-244
Timestamp: 2025-04-28T11:41:15.722Z
Learning: The minotari_console_wallet reports a fourth field timelocked_balance in the terminal output for the getBalance method, which should be reflected in the documentation for accuracy.

Learnt from: SWvheerden
PR: #7301
File: clients/rust/base_node_wallet_client/src/client/http.rs:70-106
Timestamp: 2025-07-11T06:07:36.367Z
Learning: In clients/rust/base_node_wallet_client/src/client/http.rs, the HTTP client server selection logic is intentionally designed to try the local API once and then fallback to the seed server permanently without retry mechanisms. This is the preferred behavior - they want to use local API if available, otherwise use seed server, without continuously retrying the local API.

Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the context.connectivity.dial_peer method should fail fast and return an error if a peer cannot be dialed, rather than requiring retry logic for general connection failures.

applications/minotari_node/src/lib.rs (3)

Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers method in comms/core/src/peer_manager/storage/database.rs, the SQL query intentionally uses exact equality (peers.features = ?) rather than bitwise operations (peers.features & ? != 0) when matching COMMUNICATION_NODE features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE feature, excluding those with additional feature flags.

Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1655-1658
Timestamp: 2025-05-29T09:40:09.356Z
Learning: In the Tari codebase, node_id hex strings in the database are guaranteed to be valid because they can only be added via update_peer_sql(peer: Peer) which converts from valid NodeId objects, ensuring data integrity at the insertion layer.

Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the context.connectivity.dial_peer method should fail fast and return an error if a peer cannot be dialed, rather than requiring retry logic for general connection failures.

comms/dht/src/dht.rs (9)

Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the context.connectivity.dial_peer method should fail fast and return an error if a peer cannot be dialed, rather than requiring retry logic for general connection failures.

Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the fetch_peers_from_connection and collect_peer_stream functions rely on RPC streaming, and when the main connection is closed by another process, collect_peer_stream times out after STREAM_ITEM_TIMEOUT because it cannot detect that the peer can no longer respond, returning an empty vector of peers. This is why the connection state check is important for the retry logic.

Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers method in comms/core/src/peer_manager/storage/database.rs, the SQL query intentionally uses exact equality (peers.features = ?) rather than bitwise operations (peers.features & ? != 0) when matching COMMUNICATION_NODE features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE feature, excluding those with additional feature flags.

Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the NUM_RETRIES logic in get_peers is specifically designed to handle peer connections that are closed while trying to RPC stream peer info, not general connection failures. The retry logic only applies when peers.is_empty() && !conn.is_connected() && attempt < NUM_RETRIES, which indicates a mid-stream disconnection.

Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:721-735
Timestamp: 2025-07-09T08:13:37.206Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the 10-second STREAM_ITEM_TIMEOUT and retry logic are intentionally designed to handle service conflicts where other services kill seed peer connections during seedstrap operations. The underlying discovery_peer/dial_peer API timeouts are too lenient for seedstrap use cases, so the more aggressive timeout with retry logic is appropriate and necessary.

Learnt from: SWvheerden
PR: #6951
File: base_layer/core/src/base_node/tari_pulse_service/mod.rs:327-352
Timestamp: 2025-04-16T07:06:53.981Z
Learning: The discovery_peer and dial_peer methods in the Tari codebase have built-in timeout mechanisms, so adding explicit timeouts with tokio::time::timeout is unnecessary.

Learnt from: SWvheerden
PR: #6951
File: base_layer/core/src/base_node/tari_pulse_service/mod.rs:327-352
Timestamp: 2025-04-16T07:06:53.981Z
Learning: The discovery_peer and dial_peer methods in the Tari codebase have built-in timeout mechanisms, so adding explicit timeouts with tokio::time::timeout is unnecessary.

Learnt from: hansieodendaal
PR: #7307
File: comms/core/src/connection_manager/peer_connection.rs:356-357
Timestamp: 2025-07-12T03:43:22.545Z
Learning: In comms/core/src/connectivity/manager.rs, the disconnect_if_unused call uses parameters (0, 0) intentionally to be conservative and avoid terminating connections that might be in use. This approach is used while troubleshooting inbound connection drop issues to eliminate the disconnect logic as a potential cause.

Learnt from: hansieodendaal
PR: #7307
File: comms/core/src/connection_manager/peer_connection.rs:356-357
Timestamp: 2025-07-12T03:43:22.545Z
Learning: In base_layer/core/src/base_node/tari_pulse_service/mod.rs, the disconnect_if_unused call uses parameters (0, 2) because the tari_pulse_service creates exactly 2 substreams and 0 RPC sessions during health checks.

comms/core/src/peer_manager/storage/database.rs (10)

Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers method in comms/core/src/peer_manager/storage/database.rs, the SQL query intentionally uses exact equality (peers.features = ?) rather than bitwise operations (peers.features & ? != 0) when matching COMMUNICATION_NODE features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE feature, excluding those with additional feature flags.

Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1655-1658
Timestamp: 2025-05-29T09:40:09.356Z
Learning: In the Tari codebase, node_id hex strings in the database are guaranteed to be valid because they can only be added via update_peer_sql(peer: Peer) which converts from valid NodeId objects, ensuring data integrity at the insertion layer.

Learnt from: hansieodendaal
PR: #6963
File: comms/core/src/peer_manager/manager.rs:60-68
Timestamp: 2025-05-26T02:40:23.812Z
Learning: PeerDatabaseSql in the Tari codebase has been specifically refactored to handle concurrent access and mitigate blocking I/O concerns on async executor threads. The implementation has been tested under high load at both system level and through unit tests like test_concurrent_add_or_update_and_get_closest_peers which validates concurrent read/write operations.

Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the NUM_RETRIES logic in get_peers is specifically designed to handle peer connections that are closed while trying to RPC stream peer info, not general connection failures. The retry logic only applies when peers.is_empty() && !conn.is_connected() && attempt < NUM_RETRIES, which indicates a mid-stream disconnection.

Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the fetch_peers_from_connection and collect_peer_stream functions rely on RPC streaming, and when the main connection is closed by another process, collect_peer_stream times out after STREAM_ITEM_TIMEOUT because it cannot detect that the peer can no longer respond, returning an empty vector of peers. This is why the connection state check is important for the retry logic.

Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the context.connectivity.dial_peer method should fail fast and return an error if a peer cannot be dialed, rather than requiring retry logic for general connection failures.

Learnt from: SWvheerden
PR: #6951
File: base_layer/core/src/base_node/tari_pulse_service/mod.rs:327-352
Timestamp: 2025-04-16T07:06:53.981Z
Learning: The discovery_peer and dial_peer methods in the Tari codebase have built-in timeout mechanisms, so adding explicit timeouts with tokio::time::timeout is unnecessary.

Learnt from: SWvheerden
PR: #6951
File: base_layer/core/src/base_node/tari_pulse_service/mod.rs:327-352
Timestamp: 2025-04-16T07:06:53.981Z
Learning: The discovery_peer and dial_peer methods in the Tari codebase have built-in timeout mechanisms, so adding explicit timeouts with tokio::time::timeout is unnecessary.

Learnt from: hansieodendaal
PR: #6963
File: comms/core/src/peer_manager/storage/migrations/2025-04-14-072200_initial/up.sql:24-41
Timestamp: 2025-05-02T14:07:10.892Z
Learning: The peer system design requires each network address to be uniquely associated with exactly one peer, and an address cannot be reused across multiple peers.

Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:721-735
Timestamp: 2025-07-09T08:13:37.206Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the 10-second STREAM_ITEM_TIMEOUT and retry logic are intentionally designed to handle service conflicts where other services kill seed peer connections during seedstrap operations. The underlying discovery_peer/dial_peer API timeouts are too lenient for seedstrap use cases, so the more aggressive timeout with retry logic is appropriate and necessary.

comms/core/src/peer_manager/peer_storage_sql.rs (2)

Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers method in comms/core/src/peer_manager/storage/database.rs, the SQL query intentionally uses exact equality (peers.features = ?) rather than bitwise operations (peers.features & ? != 0) when matching COMMUNICATION_NODE features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE feature, excluding those with additional feature flags.

Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1655-1658
Timestamp: 2025-05-29T09:40:09.356Z
Learning: In the Tari codebase, node_id hex strings in the database are guaranteed to be valid because they can only be added via update_peer_sql(peer: Peer) which converts from valid NodeId objects, ensuring data integrity at the insertion layer.

🧬 Code Graph Analysis (1)
comms/core/src/peer_manager/peer_storage_sql.rs (2)
comms/core/src/peer_manager/manager.rs (1)
  • hard_delete_peers (96-99)
comms/core/src/peer_manager/storage/database.rs (6)
  • hard_delete_peers (763-787)
  • node_ids (769-769)
  • node_ids (1207-1207)
  • node_ids (1215-1215)
  • node_ids (1379-1379)
  • node_ids (1507-1507)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: test (mainnet, stagenet)
  • GitHub Check: test (testnet, esmeralda)
  • GitHub Check: test (nextnet, nextnet)
  • GitHub Check: cargo check with stable
  • GitHub Check: ledger build tests
  • GitHub Check: ci
  • GitHub Check: Cucumber tests / Base Layer
  • GitHub Check: Cucumber tests / FFI
🔇 Additional comments (19)
integration_tests/src/wallet_process.rs (1)

238-238: LGTM! Consistent CLI structure update.

The addition of validate_peer_db: false maintains consistency with the updated CLI struct while appropriately defaulting to false for integration tests, ensuring existing test behavior remains unchanged.

common/src/exit_codes.rs (1)

135-136: LGTM! Well-structured exit code addition.

The new PeerDatabaseError variant follows the established pattern with sequential numbering (126) and a clear, descriptive error message. This properly supports the peer database validation feature across both wallet and node applications.

comms/core/src/peer_manager/peer_identity_claim.rs (2)

23-26: LGTM! Clean import update.

The import modification properly includes std::fmt::Display to support the new trait implementation.


90-98: LGTM! Excellent debugging support addition.

The Display implementation provides clear, structured output showing all relevant fields (addresses, features, signature) which will significantly improve logging and debugging capabilities for peer identity claims. The format is self-documenting with field names included.

applications/minotari_console_wallet/src/cli.rs (1)

110-112: LGTM! Well-designed CLI option addition.

The validate_peer_db field is properly implemented with:

  • Clear, descriptive documentation explaining its purpose
  • Both kebab-case (--validate-peer-db) and snake_case (--validate_peer_db) aliases for user convenience
  • Appropriate boolean type for the on/off functionality

The implementation follows established CLI patterns and clap conventions.

applications/minotari_node/src/cli.rs (1)

42-44: LGTM! Consistent CLI implementation across applications.

The validate_peer_db field implementation matches the wallet CLI perfectly:

  • Identical documentation and functionality description
  • Same clap attributes with both alias forms for user convenience
  • Consistent naming and type across wallet and node applications

This consistency provides a uniform user experience and maintainable codebase.

comms/core/src/peer_manager/identity_signature.rs (1)

200-211: LGTM! Well-structured Display implementation.

The Display implementation provides clear formatting of all relevant fields including hex representation of cryptographic components, which will enhance debugging and logging for the peer validation workflow.

comms/core/src/peer_manager/manager.rs (1)

95-99: LGTM! Clean delegation to storage layer.

The method properly delegates to the underlying storage implementation and maintains consistent async interface with other PeerManager methods. The simple implementation is appropriate for this abstraction layer.

applications/minotari_console_wallet/src/lib.rs (1)

199-203: LGTM! Proper integration of peer database validation.

The validation is correctly placed after wallet initialization and before interactive operations, with appropriate error handling using the new PeerDatabaseError exit code. The synchronous execution on the runtime is suitable for this startup validation step.

comms/dht/src/rpc/peer_info.rs (1)

42-42: LGTM! Corrected misleading parameter name.

The parameter name change from max_claims_per_claim to max_addresses_per_claim correctly reflects its actual usage in limiting the number of addresses per claim, improving code readability.

Also applies to: 54-54

applications/minotari_node/src/lib.rs (2)

98-98: LGTM: Correct default value for new CLI field

The validate_peer_db field is properly initialized to false, ensuring backward compatibility and opt-in behavior for the peer database validation feature.


158-163: LGTM: Well-positioned peer validation with proper error handling

The peer validation is strategically placed after node initialization but before starting the main services, which is ideal for database cleanup operations. The error mapping to ExitCode::PeerDatabaseError provides clear feedback on validation failures.

comms/dht/src/dht.rs (3)

31-31: LGTM: Appropriate imports for new peer validation functionality

The new imports PeerManagerError, PeerValidator, and UnvalidatedPeerInfo are correctly added to support the peer database validation feature.

Also applies to: 54-54, 56-56


77-78: LGTM: Proper error type extension for peer validation

Adding PeerManagerError to DhtInitializationError enables proper error propagation from the peer manager layer during validation operations.


145-195: LGTM: Comprehensive peer validation implementation with good logging

The validate_all_peers method is well-implemented with several strong points:

  • Comprehensive logging at trace level for individual peer details and debug level for summary information
  • Smart filtering to only validate non-config sourced addresses (line 167), avoiding validation of manually configured peers
  • Proper use of UnvalidatedPeerInfo::from_peer_limited_claims with configured limits
  • Batch deletion of invalid peers using hard_delete_peers for efficiency
  • Clear error handling and propagation

The implementation follows the established patterns in the Tari codebase and integrates well with the existing peer validation infrastructure.

comms/core/src/peer_manager/storage/database.rs (2)

762-787: LGTM! Well-implemented hard deletion method.

The implementation correctly handles:

  • Input validation with early return for empty input
  • Atomic operations using immediate transaction
  • Proper foreign key constraint handling by deleting multi-addresses first
  • Clear return semantics distinguishing between no input (None) and zero deletions (Some(0))

2484-2529: LGTM! Comprehensive test coverage.

The test effectively validates:

  • Database setup and peer creation across different types (node/wallet, seed/regular)
  • Mixed deletion scenario with peers of different types
  • Correct return value (deletion count)
  • Verification that deleted peers are no longer accessible via get_all_peers

The test provides good coverage for the hard deletion functionality.

comms/dht/src/peer_validator.rs (2)

58-59: LGTM: Documentation improvement enhances clarity.

The updated docstring accurately describes the method's return behavior, improving code documentation quality.


109-137: LGTM: Enhanced error handling with improved observability.

The explicit match statement maintains the same error propagation behavior while adding valuable logging for both successful and failed claim validations. This enhanced observability will be particularly useful for the bulk peer validation feature being introduced.

The logging approach is appropriate:

  • Uses trace level for detailed debugging information
  • Includes relevant context (node_id, public_key, claim details)
  • Distinguishes between success and failure cases clearly

Copy link
Collaborator

@stringhandler stringhandler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is debugging code. You can keep the pr open if you want to use it for testing but I am not going to merge it

@stringhandler stringhandler marked this pull request as draft July 21, 2025 16:17
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants