Investigating repeated LMDB errors: could a MITM / malicious peers be pushing a covert fork?

Title: Investigating repeated LMDB errors: could a MITM / malicious peers be pushing a covert fork?

Short summary
- Symptom: repeated LMDB errors during long syncs (examples: MDB_PAGE_NOTFOUND / MDB_BAD_TXN / MDB_CORRUPTED) that leave the DB unusable. db-salvage often fails.
- Hypothesis: some peers (or an on-path attacker) are providing crafted/invalid block data intended to fork/confuse the node; when the daemon attempts to apply that data it causes LMDB transaction failures and results in DB corruption.
- Goal: collect minimal, actionable evidence so we can rule in/out a network-level attack vector and recommend mitigations.

Why I suspect a network-level attack (evidence summary)
- Errors occur while applying specific blocks: logs repeatedly show "Error adding block with hash <...> … Error adding spent key image to db transaction: MDB_PAGE_NOTFOUND / MDB_BAD_TXN".
- Around the same time the daemon logs "Sync data returned a new top block candidate: 1591748 -> 3513147 [Your node is 1921399 blocks behind]", indicating peer(s) advertised a different top than the local one.
- The daemon repeatedly blocks peers after the error — suggests the node considers them misbehaving.
- Hardware checks (SMART, dmesg, fsck) show no current device errors; SMART shows many unsafe shutdowns which can cause corruption but does not fully explain block-application errors.
- db-salvage is unreliable and sometimes segfaults; recovery is difficult, amplifying the impact.

Artifacts attached (sanitized)
- artifacts/bitmonero.log.snippet.txt — key log lines showing MDB errors and sync messages (PII removed / IPs sanitized)
- artifacts/monerod-status.txt — `monerod status` output around the failure
- artifacts/monerod-print_pl.txt — peer list summary (IP addresses removed; blocked peers replaced with placeholders)
- artifacts/smartctl.txt — SMART output for /dev/nvme0n1 (health summary; SAFE/FAILED fields)
- artifacts/fsck.txt — `fsck.ext4` output run on the ext4 partition used for the blockchain

What I’ve already done
- Captured the above artifacts at the time of failure.
- Ran `sudo smartctl -a /dev/nvme0n1` — device reported SMART overall-health: PASSED; Unsafe Shutdowns: 151; Media errors: 0.
- Ran `sudo fsck.ext4 -f` after unmount — no repairable errors found.
- Deleted LMDB and started a fresh sync; fresh start proceeds fine until the next incident (i.e., corruption can reappear).

Concrete requests for maintainers / devs
1. Could malformed or malicious P2P payloads cause the daemon to attempt DB writes that result in MDB_PAGE_NOTFOUND? If so, what checks can detect this earlier (before DB damage)?
2. Recommended forensics to prove a malicious peer pushed invalid data:
   - Which exact fields to compare between peers (header hashes at the same height, block header metadata, tx indexes)?
   - Which monerod RPCs or commands will fetch the authoritative block header/hash for a given height for quick comparison?
3. Any suggestions for deterministic checks to detect a forked/modified block before it is committed to LMDB (e.g., multi-peer verification, header majority checks)?
4. Which p2p/debug flags or trace-level formats should I capture (pcap filters, ports, timestamps) to make a minimal, useful forensic capture for devs to analyze? (I can capture traffic to/from port 18080 for the peer IPs active at the time.)

Suggested experiments / mitigations I can run and report back
- Controlled single-peer test: run monerod with a single trusted peer (e.g., `--add-exclusive-node`) and see whether corruption recurs.
- Multi-peer comparison: when the failure happens, immediately fetch block header/hash for the failing height from multiple public nodes/explorers and compare.
- Run with `--db-sync-mode=safe` to reduce the window where partial writes can corrupt DB; report whether corruption stops.
- Run monerod behind a firewall that restricts peers and watch for the issue.

Copy of sanitized artifacts are attached to this issue under `artifacts/`.

If you need additional logs or a specific trace format, tell me the exact commands and time window to capture and I will run them on the next failure and attach the outputs.

Thank you — I'm happy to gather whatever extra data is most useful to reproduce or rule out a network attack vector.

-- End of issue body --


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Investigating repeated LMDB errors: could a MITM / malicious peers be pushing a covert fork? #10140

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Investigating repeated LMDB errors: could a MITM / malicious peers be pushing a covert fork? #10140

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions