Skip to content

Conversation

@ellemouton
Copy link
Collaborator

Fix SetSourceNode Race Condition with Lenient Upsert

Problem

During itest execution, the following error occurred during server startup:

unable to create server: can't set self node: unable to upsert source node:
upserting node(037e10ee34f414c80687ffa3dc0f68a9b6c88254f44d0b62d5018526627cd5e6cc):
sql: no rows in result set

This error is caused by a race condition where multiple goroutines call SetSourceNode concurrently with the same timestamp, causing the SQL store to reject the second update.

Root Cause

The Race Condition

Multiple code paths in server.go call SetSourceNode during startup:

  1. setSelfNode() (line 5700) - Called during server initialization
  2. createNewHiddenService() (line 3325) - Called during Tor setup AND by health check goroutines
  3. updateAndBroadcastSelfNode() (line 3447) - RPC handler for node updates

These paths can execute concurrently, especially during server startup when Tor is enabled.

Race Timeline

Initial state: Database has source node with timestamp T

Concurrent execution:

Thread A (setSelfNode) Thread B (createNewHiddenService)
Reads DB: timestamp T Reads in-memory: timestamp T
Increments to T+1 Increments to T+1
Calls SetSourceNode(T+1) first
✅ Success: T+1 > T
DB now has timestamp T+1 Calls SetSourceNode(T+1) second
❌ Fails: T+1 NOT > T+1
Returns sql.ErrNoRows

Why the SQL Constraint Fails

The original UpsertNode query has a WHERE clause that requires strictly increasing timestamps:

ON CONFLICT (pub_key, version)
    DO UPDATE SET
        alias = EXCLUDED.alias,
        last_update = EXCLUDED.last_update,
        ...
WHERE graph_nodes.last_update IS NULL
    OR EXCLUDED.last_update > graph_nodes.last_update  -- Must be strictly greater!
RETURNING id;

When the WHERE clause fails, no rows are updated, and the RETURNING id clause returns nothing, resulting in sql.ErrNoRows.

Why the Existing Mutex Doesn't Help

The s.mu mutex in genNodeAnnouncement() only protects the in-memory currentNodeAnn read/write. It does NOT protect:

  • Database reads in setSelfNode
  • The entire read-modify-write cycle across different call-sites
  • The actual database write in SetSourceNode

This is a classic Time-of-Check to Time-of-Use (TOCTOU) race condition.

Solution

Use Separate UpsertSourceNode Query

Added a new SQL query UpsertSourceNode (in sqldb/sqlc/queries/graph.sql) that removes the strict timestamp constraint:

-- name: UpsertSourceNode :one
INSERT INTO graph_nodes (
    version, pub_key, alias, last_update, color, signature
) VALUES (
    $1, $2, $3, $4, $5, $6
)
ON CONFLICT (pub_key, version)
    DO UPDATE SET
        alias = EXCLUDED.alias,
        last_update = EXCLUDED.last_update,
        color = EXCLUDED.color,
        signature = EXCLUDED.signature
    -- No WHERE clause - always updates!
RETURNING id;

Code Changes

Modified graph/db/sql_store.go with maximum reusability:

  1. upsertNodeAncillaryData() - Extracted common logic for updating features, addresses, and extra fields (shared by both upsert functions)

  2. populateNodeParams() - Extracted parameter building logic using callback pattern to support both UpsertNodeParams and UpsertSourceNodeParams types

  3. buildNodeUpsertParams() - Builds params for strict UpsertNode query

  4. buildSourceNodeUpsertParams() - Builds params for lenient UpsertSourceNode query

  5. upsertSourceNode() - New function using UpsertSourceNode query, reuses ancillary data helpers

  6. upsertNode() - Refactored to use helper functions, eliminating code duplication

  7. SetSourceNode() - Updated to call upsertSourceNode() instead of upsertNode()

Why This is Safe

For source node (our own node):

  • We control all updates to our own node announcement
  • Last-write-wins is acceptable because all concurrent updates are valid
  • Parameter changes must persist even with timestamp collisions
  • Matches bbolt KV store behavior (which already uses last-write-wins)

For other nodes (network gossip):

  • Still use strict UpsertNode with timestamp checking
  • Maintains Lightning Network gossip protocol guarantees
  • Same timestamp = same content (enforced by cryptographic signatures)
  • Prevents stale network updates and replay attacks

Behavior Change

Before

  • Concurrent source node updates with same timestamp → ❌ Second update fails with sql.ErrNoRows
  • Parameter changes lost if timestamp doesn't advance
  • Server startup could fail during Tor initialization

After

  • Concurrent source node updates with same timestamp → ✅ Both succeed (last-write-wins)
  • All parameter changes persist
  • Server startup succeeds reliably

Testing

Unit Test

Updated TestSetSourceNodeSameTimestamp in graph/db/graph_test.go:

  • Creates source node with timestamp T
  • Updates with same timestamp T but different parameters (alias, color)
  • Verifies update succeeds (no error)
  • Verifies parameter changes actually persisted to database

Why This Can Happen in Production

The race is particularly likely during:

  1. Server startup with Tor enabled

    • setSelfNode() runs during newServer()
    • createNewHiddenService() runs in Start()
    • Health check goroutines can trigger immediately
    • All happening within the same second
  2. Fast-running tests (itests)

    • Tight timing between initialization steps
    • Multiple rapid node updates
  3. User-triggered updates

    • RPC calls to update node info while Tor reconnects
    • Multiple parameter changes in quick succession

Fixes #10370

This commit adds TestSetSourceNodeSameTimestamp to demonstrate the
current behavior when SetSourceNode is called with the same last update
timestamp. The test reveals a difference between the SQL and bbolt
implementations:

- SQL store returns sql.ErrNoRows when attempting to update with the
  same timestamp, as the upsert query's UPDATE clause requires the new
  timestamp to be strictly greater than the existing one
- bbolt store silently ignores stale updates and returns no error

This behavior is important to document because our own node
announcements may change quickly with the same timestamp, unlike
announcements from other nodes where same timestamp typically means
identical parameters.
This query is less strict in terms of the latest update timestamp field.
We want to be less strict with our own node data since we always want
our own updates recorded.
This commit fixes a race condition where multiple goroutines call
SetSourceNode concurrently during startup, causing sql.ErrNoRows
errors. The race occurs when multiple code paths (setSelfNode,
createNewHiddenService, RPC updates) read the same old timestamp,
independently increment it to the same new value (T+1), and race to
write.

The fix uses the new UpsertSourceNode SQL query (without strict
timestamp constraint) instead of UpsertNode. This allows
last-write-wins semantics for our own node, ensuring all parameter
changes persist even when timestamps collide.

Refactored sql_store.go for reusability:
- upsertNodeAncillaryData: common logic for features/addresses/extras
- populateNodeParams: common parameter building with callback pattern
- buildNodeUpsertParams: builds params for strict UpsertNode
- buildSourceNodeUpsertParams: builds params for lenient UpsertSourceNode
- upsertSourceNode: new function using lenient query

Updated TestSetSourceNodeSameTimestamp to verify that concurrent
updates with the same timestamp now succeed and parameter changes
persist.

Fixes the itest error:
"unable to upsert source node: upserting node(...): sql: no rows in
result set"
@gemini-code-assist
Copy link

Summary of Changes

Hello @ellemouton, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a race condition in the SetSourceNode function that could lead to server startup failures, particularly when Tor is enabled. The issue stemmed from multiple concurrent attempts to update the local node's information with the same timestamp, which the database's strict timestamp check would reject. The solution introduces a new, more lenient SQL upsert query specifically for the local source node, allowing updates to proceed even with identical timestamps (last-write-wins). This ensures that all parameter changes to the local node persist, enhancing server stability and reliability, while maintaining the necessary strictness for other network nodes.

Highlights

  • Fixes Race Condition: Resolves a "sql: no rows in result set" error during server startup caused by concurrent SetSourceNode calls with identical timestamps.
  • New Lenient Upsert: Introduces UpsertSourceNode SQL query that removes the strict last_update timestamp constraint for the local source node, allowing "last-write-wins" behavior.
  • Refactored Node Upsert Logic: Extracts common code into helper functions (upsertNodeAncillaryData, populateNodeParams, buildNodeUpsertParams, buildSourceNodeUpsertParams) to improve reusability and maintainability.
  • Targeted Application: The lenient upsert is applied only to the local source node, preserving strict timestamp checks for other network nodes to maintain Lightning Network gossip protocol integrity.
  • Improved Test Coverage: Adds TestSetSourceNodeSameTimestamp to specifically verify that concurrent updates to the source node with the same timestamp now succeed and persist changes.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an excellent pull request that clearly identifies and fixes a tricky race condition during server startup. The solution of using a more lenient upsert for the source node is well-reasoned and safe. I particularly appreciate the extensive refactoring in sql_store.go to eliminate code duplication and improve maintainability. The new unit test TestSetSourceNodeSameTimestamp is also a great addition that directly verifies the fix. I have one minor style suggestion.

Comment on lines +24 to +26
-- We use a separate upsert for our own node since we want to be less strict
-- about the last_update field. For our own node, we always want to
-- update the record even if the last_update is older than what we have.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

According to the style guide, function comments should start with the function name and be a complete sentence.1 This comment is used to generate the Go docstring for UpsertSourceNode, which currently violates this rule. Please update the comment to adhere to the style guide.

-- UpsertSourceNode uses a separate upsert for our own node since we want to be
-- less strict about the last_update field. For our own node, we always want to
-- update the record even if the last_update is older than what we have.

Style Guide References

Footnotes

@yyforyongyu yyforyongyu self-requested a review November 14, 2025 12:09
@saubyk saubyk modified the milestones: v0.21.0, v0.20.1 Nov 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unable to create server due to sql: no rows in result set

2 participants