-
Notifications
You must be signed in to change notification settings - Fork 2.2k
graph/db: fix SetSourceNode no rows error #10371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
graph/db: fix SetSourceNode no rows error #10371
Conversation
This commit adds TestSetSourceNodeSameTimestamp to demonstrate the current behavior when SetSourceNode is called with the same last update timestamp. The test reveals a difference between the SQL and bbolt implementations: - SQL store returns sql.ErrNoRows when attempting to update with the same timestamp, as the upsert query's UPDATE clause requires the new timestamp to be strictly greater than the existing one - bbolt store silently ignores stale updates and returns no error This behavior is important to document because our own node announcements may change quickly with the same timestamp, unlike announcements from other nodes where same timestamp typically means identical parameters.
This query is less strict in terms of the latest update timestamp field. We want to be less strict with our own node data since we always want our own updates recorded.
This commit fixes a race condition where multiple goroutines call SetSourceNode concurrently during startup, causing sql.ErrNoRows errors. The race occurs when multiple code paths (setSelfNode, createNewHiddenService, RPC updates) read the same old timestamp, independently increment it to the same new value (T+1), and race to write. The fix uses the new UpsertSourceNode SQL query (without strict timestamp constraint) instead of UpsertNode. This allows last-write-wins semantics for our own node, ensuring all parameter changes persist even when timestamps collide. Refactored sql_store.go for reusability: - upsertNodeAncillaryData: common logic for features/addresses/extras - populateNodeParams: common parameter building with callback pattern - buildNodeUpsertParams: builds params for strict UpsertNode - buildSourceNodeUpsertParams: builds params for lenient UpsertSourceNode - upsertSourceNode: new function using lenient query Updated TestSetSourceNodeSameTimestamp to verify that concurrent updates with the same timestamp now succeed and parameter changes persist. Fixes the itest error: "unable to upsert source node: upserting node(...): sql: no rows in result set"
Summary of ChangesHello @ellemouton, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a race condition in the SetSourceNode function that could lead to server startup failures, particularly when Tor is enabled. The issue stemmed from multiple concurrent attempts to update the local node's information with the same timestamp, which the database's strict timestamp check would reject. The solution introduces a new, more lenient SQL upsert query specifically for the local source node, allowing updates to proceed even with identical timestamps (last-write-wins). This ensures that all parameter changes to the local node persist, enhancing server stability and reliability, while maintaining the necessary strictness for other network nodes. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is an excellent pull request that clearly identifies and fixes a tricky race condition during server startup. The solution of using a more lenient upsert for the source node is well-reasoned and safe. I particularly appreciate the extensive refactoring in sql_store.go to eliminate code duplication and improve maintainability. The new unit test TestSetSourceNodeSameTimestamp is also a great addition that directly verifies the fix. I have one minor style suggestion.
| -- We use a separate upsert for our own node since we want to be less strict | ||
| -- about the last_update field. For our own node, we always want to | ||
| -- update the record even if the last_update is older than what we have. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the style guide, function comments should start with the function name and be a complete sentence.1 This comment is used to generate the Go docstring for UpsertSourceNode, which currently violates this rule. Please update the comment to adhere to the style guide.
-- UpsertSourceNode uses a separate upsert for our own node since we want to be
-- less strict about the last_update field. For our own node, we always want to
-- update the record even if the last_update is older than what we have.
Fix SetSourceNode Race Condition with Lenient Upsert
Problem
During itest execution, the following error occurred during server startup:
This error is caused by a race condition where multiple goroutines call
SetSourceNodeconcurrently with the same timestamp, causing the SQL store to reject the second update.Root Cause
The Race Condition
Multiple code paths in
server.gocallSetSourceNodeduring startup:setSelfNode()(line 5700) - Called during server initializationcreateNewHiddenService()(line 3325) - Called during Tor setup AND by health check goroutinesupdateAndBroadcastSelfNode()(line 3447) - RPC handler for node updatesThese paths can execute concurrently, especially during server startup when Tor is enabled.
Race Timeline
Initial state: Database has source node with timestamp
TConcurrent execution:
TTT+1T+1SetSourceNode(T+1)firstT+1 > TT+1SetSourceNode(T+1)secondT+1NOT> T+1sql.ErrNoRowsWhy the SQL Constraint Fails
The original
UpsertNodequery has a WHERE clause that requires strictly increasing timestamps:When the WHERE clause fails, no rows are updated, and the
RETURNING idclause returns nothing, resulting insql.ErrNoRows.Why the Existing Mutex Doesn't Help
The
s.mumutex ingenNodeAnnouncement()only protects the in-memorycurrentNodeAnnread/write. It does NOT protect:setSelfNodeSetSourceNodeThis is a classic Time-of-Check to Time-of-Use (TOCTOU) race condition.
Solution
Use Separate UpsertSourceNode Query
Added a new SQL query
UpsertSourceNode(insqldb/sqlc/queries/graph.sql) that removes the strict timestamp constraint:Code Changes
Modified
graph/db/sql_store.gowith maximum reusability:upsertNodeAncillaryData()- Extracted common logic for updating features, addresses, and extra fields (shared by both upsert functions)populateNodeParams()- Extracted parameter building logic using callback pattern to support bothUpsertNodeParamsandUpsertSourceNodeParamstypesbuildNodeUpsertParams()- Builds params for strictUpsertNodequerybuildSourceNodeUpsertParams()- Builds params for lenientUpsertSourceNodequeryupsertSourceNode()- New function usingUpsertSourceNodequery, reuses ancillary data helpersupsertNode()- Refactored to use helper functions, eliminating code duplicationSetSourceNode()- Updated to callupsertSourceNode()instead ofupsertNode()Why This is Safe
For source node (our own node):
For other nodes (network gossip):
UpsertNodewith timestamp checkingBehavior Change
Before
sql.ErrNoRowsAfter
Testing
Unit Test
Updated
TestSetSourceNodeSameTimestampingraph/db/graph_test.go:TTbut different parameters (alias, color)Why This Can Happen in Production
The race is particularly likely during:
Server startup with Tor enabled
setSelfNode()runs duringnewServer()createNewHiddenService()runs inStart()Fast-running tests (itests)
User-triggered updates
Fixes #10370