-
Notifications
You must be signed in to change notification settings - Fork 247
feat: add borsh serialisation and deserialisation to peer_db peer claim source #7358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: development
Are you sure you want to change the base?
feat: add borsh serialisation and deserialisation to peer_db peer claim source #7358
Conversation
Added Borsh serialization/deserialisation to the peer_db's MultiaddrWithStats peer source field. Serde conversion was not always successful - validated peer info would be added to the database, then ~0.4% of the time the data would be skewed when read form the database do that peer validation fails due to claim signature validation failure, most likely the `updated_at: DateTime<Utc>` field.
WalkthroughThis change transitions the serialization of peer address sources in the peer manager from JSON (text) to Borsh (binary) format. It updates Rust structs, database schema, and migration scripts accordingly, and implements Borsh serialization/deserialization for relevant types. Comprehensive tests are added to ensure round-trip correctness and consistency. Changes
Sequence Diagram(s)sequenceDiagram
participant RustStruct as Rust Struct (PeerAddressSource, etc.)
participant Borsh as Borsh Serializer
participant DB as Database
RustStruct->>Borsh: Serialize (to Vec<u8>)
Borsh->>DB: Store as Binary (BLOB)
DB->>Borsh: Retrieve Binary (BLOB)
Borsh->>RustStruct: Deserialize (from Vec<u8>)
Estimated code review effort4 (~90 minutes) Possibly related PRs
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (10)
comms/core/Cargo.toml
(1 hunks)comms/core/src/net_address/multiaddr_with_stats.rs
(4 hunks)comms/core/src/peer_manager/identity_signature.rs
(3 hunks)comms/core/src/peer_manager/manager.rs
(2 hunks)comms/core/src/peer_manager/mod.rs
(1 hunks)comms/core/src/peer_manager/peer_identity_claim.rs
(2 hunks)comms/core/src/peer_manager/storage/database.rs
(9 hunks)comms/core/src/peer_manager/storage/migrations/2025-07-21-170500_peer_address_source/down.sql
(1 hunks)comms/core/src/peer_manager/storage/migrations/2025-07-21-170500_peer_address_source/up.sql
(1 hunks)comms/core/src/peer_manager/storage/schema.rs
(1 hunks)
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: hansieodendaal
PR: tari-project/tari#7123
File: comms/core/src/peer_manager/storage/database.rs:1655-1658
Timestamp: 2025-05-29T09:40:09.356Z
Learning: In the Tari codebase, node_id hex strings in the database are guaranteed to be valid because they can only be added via `update_peer_sql(peer: Peer)` which converts from valid NodeId objects, ensuring data integrity at the insertion layer.
Learnt from: hansieodendaal
PR: tari-project/tari#6963
File: comms/core/src/peer_manager/manager.rs:60-68
Timestamp: 2025-05-26T02:40:23.812Z
Learning: PeerDatabaseSql in the Tari codebase has been specifically refactored to handle concurrent access and mitigate blocking I/O concerns on async executor threads. The implementation has been tested under high load at both system level and through unit tests like test_concurrent_add_or_update_and_get_closest_peers which validates concurrent read/write operations.
Learnt from: hansieodendaal
PR: tari-project/tari#7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the `hard_delete_all_stale_peers` method in `comms/core/src/peer_manager/storage/database.rs`, the SQL query intentionally uses exact equality (`peers.features = ?`) rather than bitwise operations (`peers.features & ? != 0`) when matching `COMMUNICATION_NODE` features. This is the intended behavior to match only peers with exactly the `COMMUNICATION_NODE` feature, excluding those with additional feature flags.
comms/core/src/peer_manager/storage/schema.rs (2)
Learnt from: hansieodendaal
PR: #6963
File: common_sqlite/src/error.rs:88-92
Timestamp: 2025-05-23T07:49:57.349Z
Learning: In the StorageError enum in common_sqlite/src/error.rs, the HexError variant should keep the manual From implementation rather than using #[from] attribute, as it stores a String representation of the error rather than the HexError type itself.
Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers
method in comms/core/src/peer_manager/storage/database.rs
, the SQL query intentionally uses exact equality (peers.features = ?
) rather than bitwise operations (peers.features & ? != 0
) when matching COMMUNICATION_NODE
features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE
feature, excluding those with additional feature flags.
comms/core/src/peer_manager/storage/migrations/2025-07-21-170500_peer_address_source/down.sql (1)
Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers
method in comms/core/src/peer_manager/storage/database.rs
, the SQL query intentionally uses exact equality (peers.features = ?
) rather than bitwise operations (peers.features & ? != 0
) when matching COMMUNICATION_NODE
features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE
feature, excluding those with additional feature flags.
comms/core/src/net_address/multiaddr_with_stats.rs (8)
Learnt from: SWvheerden
PR: #6951
File: base_layer/core/src/base_node/tari_pulse_service/mod.rs:327-352
Timestamp: 2025-04-16T07:06:53.981Z
Learning: The discovery_peer and dial_peer methods in the Tari codebase have built-in timeout mechanisms, so adding explicit timeouts with tokio::time::timeout is unnecessary.
Learnt from: SWvheerden
PR: #6951
File: base_layer/core/src/base_node/tari_pulse_service/mod.rs:327-352
Timestamp: 2025-04-16T07:06:53.981Z
Learning: The discovery_peer and dial_peer methods in the Tari codebase have built-in timeout mechanisms, so adding explicit timeouts with tokio::time::timeout is unnecessary.
Learnt from: hansieodendaal
PR: #6963
File: comms/core/src/peer_manager/storage/migrations/2025-04-14-072200_initial/up.sql:24-41
Timestamp: 2025-05-02T14:07:10.892Z
Learning: The peer system design requires each network address to be uniquely associated with exactly one peer, and an address cannot be reused across multiple peers.
Learnt from: hansieodendaal
PR: #6963
File: comms/dht/src/proto/mod.rs:141-142
Timestamp: 2025-05-02T07:12:23.985Z
Learning: The PeerFeatures::from_bits_u32_truncate
method truncates a u32 to u8 bits but can still return None
if the resulting bits don't match any valid flags, making the error handling with .ok_or_else()
necessary even after truncation.
Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the fetch_peers_from_connection and collect_peer_stream functions rely on RPC streaming, and when the main connection is closed by another process, collect_peer_stream times out after STREAM_ITEM_TIMEOUT because it cannot detect that the peer can no longer respond, returning an empty vector of peers. This is why the connection state check is important for the retry logic.
Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:721-735
Timestamp: 2025-07-09T08:13:37.206Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the 10-second STREAM_ITEM_TIMEOUT and retry logic are intentionally designed to handle service conflicts where other services kill seed peer connections during seedstrap operations. The underlying discovery_peer/dial_peer API timeouts are too lenient for seedstrap use cases, so the more aggressive timeout with retry logic is appropriate and necessary.
Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the context.connectivity.dial_peer method should fail fast and return an error if a peer cannot be dialed, rather than requiring retry logic for general connection failures.
Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers
method in comms/core/src/peer_manager/storage/database.rs
, the SQL query intentionally uses exact equality (peers.features = ?
) rather than bitwise operations (peers.features & ? != 0
) when matching COMMUNICATION_NODE
features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE
feature, excluding those with additional feature flags.
comms/core/src/peer_manager/storage/migrations/2025-07-21-170500_peer_address_source/up.sql (3)
Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers
method in comms/core/src/peer_manager/storage/database.rs
, the SQL query intentionally uses exact equality (peers.features = ?
) rather than bitwise operations (peers.features & ? != 0
) when matching COMMUNICATION_NODE
features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE
feature, excluding those with additional feature flags.
Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1655-1658
Timestamp: 2025-05-29T09:40:09.356Z
Learning: In the Tari codebase, node_id hex strings in the database are guaranteed to be valid because they can only be added via update_peer_sql(peer: Peer)
which converts from valid NodeId objects, ensuring data integrity at the insertion layer.
Learnt from: hansieodendaal
PR: #6963
File: comms/core/src/peer_manager/manager.rs:60-68
Timestamp: 2025-05-26T02:40:23.812Z
Learning: PeerDatabaseSql in the Tari codebase has been specifically refactored to handle concurrent access and mitigate blocking I/O concerns on async executor threads. The implementation has been tested under high load at both system level and through unit tests like test_concurrent_add_or_update_and_get_closest_peers which validates concurrent read/write operations.
comms/core/src/peer_manager/peer_identity_claim.rs (4)
Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers
method in comms/core/src/peer_manager/storage/database.rs
, the SQL query intentionally uses exact equality (peers.features = ?
) rather than bitwise operations (peers.features & ? != 0
) when matching COMMUNICATION_NODE
features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE
feature, excluding those with additional feature flags.
Learnt from: hansieodendaal
PR: #6963
File: comms/dht/src/proto/mod.rs:141-142
Timestamp: 2025-05-02T07:12:23.985Z
Learning: The PeerFeatures::from_bits_u32_truncate
method truncates a u32 to u8 bits but can still return None
if the resulting bits don't match any valid flags, making the error handling with .ok_or_else()
necessary even after truncation.
Learnt from: SWvheerden
PR: #6951
File: base_layer/core/src/base_node/tari_pulse_service/mod.rs:327-352
Timestamp: 2025-04-16T07:06:53.981Z
Learning: The discovery_peer and dial_peer methods in the Tari codebase have built-in timeout mechanisms, so adding explicit timeouts with tokio::time::timeout is unnecessary.
Learnt from: SWvheerden
PR: #6951
File: base_layer/core/src/base_node/tari_pulse_service/mod.rs:327-352
Timestamp: 2025-04-16T07:06:53.981Z
Learning: The discovery_peer and dial_peer methods in the Tari codebase have built-in timeout mechanisms, so adding explicit timeouts with tokio::time::timeout is unnecessary.
comms/core/src/peer_manager/identity_signature.rs (2)
Learnt from: hansieodendaal
PR: #7284
File: applications/minotari_console_wallet/src/automation/commands.rs:0-0
Timestamp: 2025-07-15T12:23:14.650Z
Learning: In applications/minotari_console_wallet/src/automation/commands.rs, the consistent error handling pattern for command execution is to use match statements that: 1) On success: log with debug!, print user feedback, and push tx_id to tx_ids vector for monitoring, 2) On error: print error message with eprintln! using the format "{CommandName} error! {}", rather than using .unwrap() which would panic.
Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers
method in comms/core/src/peer_manager/storage/database.rs
, the SQL query intentionally uses exact equality (peers.features = ?
) rather than bitwise operations (peers.features & ? != 0
) when matching COMMUNICATION_NODE
features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE
feature, excluding those with additional feature flags.
comms/core/src/peer_manager/mod.rs (4)
Learnt from: SWvheerden
PR: #6951
File: base_layer/core/src/base_node/tari_pulse_service/mod.rs:327-352
Timestamp: 2025-04-16T07:06:53.981Z
Learning: The discovery_peer and dial_peer methods in the Tari codebase have built-in timeout mechanisms, so adding explicit timeouts with tokio::time::timeout is unnecessary.
Learnt from: SWvheerden
PR: #6951
File: base_layer/core/src/base_node/tari_pulse_service/mod.rs:327-352
Timestamp: 2025-04-16T07:06:53.981Z
Learning: The discovery_peer and dial_peer methods in the Tari codebase have built-in timeout mechanisms, so adding explicit timeouts with tokio::time::timeout is unnecessary.
Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers
method in comms/core/src/peer_manager/storage/database.rs
, the SQL query intentionally uses exact equality (peers.features = ?
) rather than bitwise operations (peers.features & ? != 0
) when matching COMMUNICATION_NODE
features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE
feature, excluding those with additional feature flags.
Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the fetch_peers_from_connection and collect_peer_stream functions rely on RPC streaming, and when the main connection is closed by another process, collect_peer_stream times out after STREAM_ITEM_TIMEOUT because it cannot detect that the peer can no longer respond, returning an empty vector of peers. This is why the connection state check is important for the retry logic.
comms/core/src/peer_manager/storage/database.rs (11)
Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers
method in comms/core/src/peer_manager/storage/database.rs
, the SQL query intentionally uses exact equality (peers.features = ?
) rather than bitwise operations (peers.features & ? != 0
) when matching COMMUNICATION_NODE
features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE
feature, excluding those with additional feature flags.
Learnt from: hansieodendaal
PR: #6963
File: common_sqlite/src/error.rs:88-92
Timestamp: 2025-05-23T07:49:57.349Z
Learning: In the StorageError enum in common_sqlite/src/error.rs, the HexError variant should keep the manual From implementation rather than using #[from] attribute, as it stores a String representation of the error rather than the HexError type itself.
Learnt from: hansieodendaal
PR: #6963
File: comms/core/src/peer_manager/manager.rs:60-68
Timestamp: 2025-05-26T02:40:23.812Z
Learning: PeerDatabaseSql in the Tari codebase has been specifically refactored to handle concurrent access and mitigate blocking I/O concerns on async executor threads. The implementation has been tested under high load at both system level and through unit tests like test_concurrent_add_or_update_and_get_closest_peers which validates concurrent read/write operations.
Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1655-1658
Timestamp: 2025-05-29T09:40:09.356Z
Learning: In the Tari codebase, node_id hex strings in the database are guaranteed to be valid because they can only be added via update_peer_sql(peer: Peer)
which converts from valid NodeId objects, ensuring data integrity at the insertion layer.
Learnt from: SWvheerden
PR: #6951
File: base_layer/core/src/base_node/tari_pulse_service/mod.rs:327-352
Timestamp: 2025-04-16T07:06:53.981Z
Learning: The discovery_peer and dial_peer methods in the Tari codebase have built-in timeout mechanisms, so adding explicit timeouts with tokio::time::timeout is unnecessary.
Learnt from: SWvheerden
PR: #6951
File: base_layer/core/src/base_node/tari_pulse_service/mod.rs:327-352
Timestamp: 2025-04-16T07:06:53.981Z
Learning: The discovery_peer and dial_peer methods in the Tari codebase have built-in timeout mechanisms, so adding explicit timeouts with tokio::time::timeout is unnecessary.
Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the fetch_peers_from_connection and collect_peer_stream functions rely on RPC streaming, and when the main connection is closed by another process, collect_peer_stream times out after STREAM_ITEM_TIMEOUT because it cannot detect that the peer can no longer respond, returning an empty vector of peers. This is why the connection state check is important for the retry logic.
Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the NUM_RETRIES logic in get_peers is specifically designed to handle peer connections that are closed while trying to RPC stream peer info, not general connection failures. The retry logic only applies when peers.is_empty() && !conn.is_connected() && attempt < NUM_RETRIES, which indicates a mid-stream disconnection.
Learnt from: hansieodendaal
PR: #6963
File: comms/core/src/peer_manager/storage/migrations/2025-04-14-072200_initial/up.sql:24-41
Timestamp: 2025-05-02T14:07:10.892Z
Learning: The peer system design requires each network address to be uniquely associated with exactly one peer, and an address cannot be reused across multiple peers.
Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the context.connectivity.dial_peer method should fail fast and return an error if a peer cannot be dialed, rather than requiring retry logic for general connection failures.
Learnt from: hansieodendaal
PR: #6963
File: comms/dht/src/proto/mod.rs:141-142
Timestamp: 2025-05-02T07:12:23.985Z
Learning: The PeerFeatures::from_bits_u32_truncate
method truncates a u32 to u8 bits but can still return None
if the resulting bits don't match any valid flags, making the error handling with .ok_or_else()
necessary even after truncation.
comms/core/src/peer_manager/manager.rs (7)
Learnt from: hansieodendaal
PR: #7123
File: comms/core/src/peer_manager/storage/database.rs:1517-1541
Timestamp: 2025-05-29T09:42:20.881Z
Learning: In the hard_delete_all_stale_peers
method in comms/core/src/peer_manager/storage/database.rs
, the SQL query intentionally uses exact equality (peers.features = ?
) rather than bitwise operations (peers.features & ? != 0
) when matching COMMUNICATION_NODE
features. This is the intended behavior to match only peers with exactly the COMMUNICATION_NODE
feature, excluding those with additional feature flags.
Learnt from: SWvheerden
PR: #6951
File: base_layer/core/src/base_node/tari_pulse_service/mod.rs:327-352
Timestamp: 2025-04-16T07:06:53.981Z
Learning: The discovery_peer and dial_peer methods in the Tari codebase have built-in timeout mechanisms, so adding explicit timeouts with tokio::time::timeout is unnecessary.
Learnt from: SWvheerden
PR: #6951
File: base_layer/core/src/base_node/tari_pulse_service/mod.rs:327-352
Timestamp: 2025-04-16T07:06:53.981Z
Learning: The discovery_peer and dial_peer methods in the Tari codebase have built-in timeout mechanisms, so adding explicit timeouts with tokio::time::timeout is unnecessary.
Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the fetch_peers_from_connection and collect_peer_stream functions rely on RPC streaming, and when the main connection is closed by another process, collect_peer_stream times out after STREAM_ITEM_TIMEOUT because it cannot detect that the peer can no longer respond, returning an empty vector of peers. This is why the connection state check is important for the retry logic.
Learnt from: hansieodendaal
PR: #6963
File: comms/dht/src/proto/mod.rs:141-142
Timestamp: 2025-05-02T07:12:23.985Z
Learning: The PeerFeatures::from_bits_u32_truncate
method truncates a u32 to u8 bits but can still return None
if the resulting bits don't match any valid flags, making the error handling with .ok_or_else()
necessary even after truncation.
Learnt from: hansieodendaal
PR: #7294
File: comms/dht/src/network_discovery/seed_strap.rs:352-456
Timestamp: 2025-07-09T08:33:29.320Z
Learning: In comms/dht/src/network_discovery/seed_strap.rs, the context.connectivity.dial_peer method should fail fast and return an error if a peer cannot be dialed, rather than requiring retry logic for general connection failures.
Learnt from: hansieodendaal
PR: #6963
File: comms/core/src/peer_manager/manager.rs:60-68
Timestamp: 2025-05-26T02:40:23.812Z
Learning: PeerDatabaseSql in the Tari codebase has been specifically refactored to handle concurrent access and mitigate blocking I/O concerns on async executor threads. The implementation has been tested under high load at both system level and through unit tests like test_concurrent_add_or_update_and_get_closest_peers which validates concurrent read/write operations.
🔇 Additional comments (22)
comms/core/Cargo.toml (1)
68-68
: LGTM: Borsh dependency addition is appropriate.The
borsh = "1.5.7"
dependency addition supports the transition from JSON to binary serialization for peer address sources, which should improve reliability and performance.comms/core/src/peer_manager/storage/migrations/2025-07-21-170500_peer_address_source/down.sql (1)
1-5
: LGTM: Appropriate handling of irreversible migration.The comments correctly explain that rolling back from Binary to Text format is not feasible since the original JSON strings cannot be reliably reconstructed. This is a sensible approach for this type of data migration.
comms/core/src/peer_manager/storage/schema.rs (1)
46-46
: LGTM: Schema change aligns with Borsh serialization migration.The change from
Text
toBinary
for thesource
column correctly supports the transition from JSON string to Borsh binary serialization format.comms/core/src/peer_manager/mod.rs (2)
101-102
: LGTM: Appropriate test utility export.The addition of
create_test_peer_with_claim
export with proper#[cfg(test)]
guarding provides necessary test utilities for Borsh serialization testing.
108-109
: LGTM: Appropriate test utility export.The addition of
create_test_peer_identity_claim
export with proper#[cfg(test)]
guarding complements the testing infrastructure for peer identity claims.comms/core/src/net_address/multiaddr_with_stats.rs (4)
14-14
: LGTM: Appropriate Borsh imports.The import of
BorshDeserialize
andBorshSerialize
traits is necessary for the serialization implementation.
390-390
: LGTM: Borsh derives added to PeerAddressSource.Adding
BorshSerialize, BorshDeserialize
derives to thePeerAddressSource
enum enables binary serialization, which is the core requirement for this migration.
471-475
: LGTM: Appropriate test imports.The test imports for
BorshDeserialize
,BorshSerialize
,Multiaddr
, and the test utility function are correctly added to support the new serialization test.
571-593
: LGTM: Comprehensive Borsh serialization test.The test performs 1000 iterations of round-trip serialization/deserialization testing, which provides excellent coverage to ensure the Borsh implementation is reliable and consistent. This addresses the core issue mentioned in the PR objectives where Serde conversion was occasionally unreliable.
comms/core/src/peer_manager/manager.rs (1)
404-449
: LGTM! Well-structured test utility function.The new
create_test_peer_with_claim
function properly creates a peer with aPeerIdentityClaim
as the address source, which is essential for testing the Borsh serialization functionality. The implementation follows the existing pattern ofcreate_test_peer
while adding claim-based address source support.comms/core/src/peer_manager/identity_signature.rs (2)
201-237
: Solid Borsh serialization implementation.The Borsh serialization/deserialization implementation for
IdentitySignature
is well-structured:
- Correctly handles cryptographic key serialization using canonical byte representations
- Properly serializes
DateTime<Utc>
by breaking it into timestamp and nanoseconds- Includes appropriate error handling converting parsing errors to
IoError
- Maintains consistent field ordering between serialization and deserialization
303-323
: Comprehensive round-trip test for serialization correctness.The test performs 1000 iterations of Borsh serialization/deserialization round-trips, which provides strong confidence in the consistency of the implementation. This directly addresses the core issue of serialization corruption mentioned in the PR objectives.
comms/core/src/peer_manager/peer_identity_claim.rs (3)
88-122
: Well-implemented Borsh serialization with proper error handling.The Borsh serialization implementation correctly handles the conversion of
Multiaddr
objects to/from strings and includes proper error handling for parsing failures and invalid feature bits. The approach of serializing addresses as strings is appropriate sinceMultiaddr
doesn't natively support Borsh.
124-168
: Excellent test utility with diverse address types.The
create_test_peer_identity_claim
function creates comprehensive test data with IPv4, IPv6, and onion addresses, ensuring good test coverage across different multiaddr types. The proper signature generation makes this a valuable utility for testing claim-based functionality.
177-195
: Thorough round-trip test matching the established pattern.The test provides 1000 iterations of serialization/deserialization verification, which aligns with the testing approach used for
IdentitySignature
and provides strong confidence in the Borsh implementation correctness.comms/core/src/peer_manager/storage/migrations/2025-07-21-170500_peer_address_source/up.sql (1)
1-54
: Migration correctly handles column type change but results in data loss.The migration appropriately drops and recreates the tables since changing from
TEXT
toBLOB
cannot be reliably converted. All table structures, indexes, and constraints are properly preserved. However, this migration will result in complete loss of existing peer data.Please verify that this data loss is acceptable for the deployment strategy. Consider if any data preservation steps are needed before applying this migration.
comms/core/src/peer_manager/storage/database.rs (6)
25-25
: LGTM: Import addition for Borsh serializationThe addition of Borsh imports is appropriate for the serialization change.
1658-1658
: Verify: Non-optional source fieldThe
source
field inNewMultiaddrWithStatsSql
is nowVec<u8>
(non-optional), while inUpdateMultiaddrWithStatsSql
it remains optional (Option<Vec<u8>>
). Is this intentional? Consider making itOption<Vec<u8>>
for consistency and to handle cases where source data might be missing.
566-570
: LGTM: Correct Borsh serialization implementationThe serialization of
address.source()
using Borsh is implemented correctly with proper error handling.
671-675
: LGTM: Consistent serialization in update methodThe Borsh serialization implementation is consistent with the add method.
1815-1815
: LGTM: Correct Borsh deserializationThe deserialization using
PeerAddressSource::deserialize_reader
is implemented correctly with proper error propagation.
2466-2488
: Excellent test coverage for serialization consistencyThe test effectively validates the Borsh serialization/deserialization round-trip for 1000 peers with claims. This comprehensive test should help ensure the corruption issues mentioned in the PR objectives are resolved.
it makes no sense that a json serialze/deserialze would fail without any error and create invalid data that can be read successfully. |
Description
Added Borsh serialisation/deserialisation to the peer_db's MultiaddrWithStats peer source field in favour of using
serde_json
.See #7306 - further system-level monitoring is needed to verify that this was the root cause.
Motivation and Context
Serde (
serde_json
) conversion was not always successful - validated peer info would be added to the database, then, very rarely, the data would be skewed when either written to or read from the database, so that peer validation fails due to claim signature validation failure, most likely theupdated_at: DateTime<Utc>
field.It was easy to detect invalid peer info sent over the wire from sync peers as the invalidated data is present in almost all peer dbs, but difficult to sumulate invalid data in the local test peer db. Starting with a fresh peer db each time, initial peer sync had to be done many times to capture a bad peer db instance where validation afterwards would fail. In the case presented here, ~0.4% of the entries was invalid.
The example below shows a valid peer info added to the database, but when read back to validate, it fails with the 2nd address.
How Has This Been Tested?
Added unit tests.
System-level testing.
To be monitored.
What process can a PR reviewer use to test or verify this change?
Code review.
System-level testing.
Breaking Changes
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chores