Skip to content

Investigate increased memory usage while syncing with Genesis in 10.5 #1545

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
amesgen opened this issue Jun 2, 2025 · 3 comments
Open
Labels
Genesis PRs related to Genesis testing and implementation

Comments

@amesgen
Copy link
Member

amesgen commented Jun 2, 2025

@karknu has performed some experiments with full syncs using 10.5 (which is not yet released, but using ouroboros-consensus-0.27 which is intended for 10.5, with some additional patches), and reports an increase in memory, causing the node to crash with a 28GB heap limit:

The memory increase only happens with genesis enabled but I can’t tell if it is a space leak or genesis just requires slightly more memory (more active peers => more memory)

The goal of this ticket is to find out whether there is a new serious memory leak (which we must fix) or whether the memory requirement just grew slightly (eg it is expected that #1288 should increase peak memory usage slightly), which is not a big problem.


We know that 10.4.1 definitely doesn't have a (serious) leak; Nick did a full sync, and the SDET sync tests also support this. We do know that Genesis uses a bit more memory than Praos (goes away after a restart once caught-up), which would be nice to fully understand, but it is low priority.

@amesgen amesgen added the Genesis PRs related to Genesis testing and implementation label Jun 2, 2025
@amesgen amesgen moved this to 🏗 In progress in Consensus Team Backlog Jun 2, 2025
@amesgen amesgen self-assigned this Jun 2, 2025
@amesgen
Copy link
Member Author

amesgen commented Jun 5, 2025

I have to pause this, leaving some plots/analysis here:

Plot of a mainnet sync from Genesis using the default Genesis config

Image

Local sync of the first 1e6 slots

Image

This definitely shows that Genesis uses more memory.

One slightly subtle thing is the creation of a "local-only" peer snapshot, see here.

Also see the eventlog2html output (with a custom GHC and -fdistinct-constructor-tables), which shows that the increase is mostly due to extra ByteStrings, but doesn't give away the culprit directly: eventlog2html.tar.gz

Sync of the first 1e6 slots using Praos, but using 30 peers

This variant (suggested by @karknu) intends to reveal to what extend this memory increase is related just to the amount of peers.

Image

Things to note:

  • It is enormously (28x) slower than both Praos with 1 peer and Genesis with 30 peers (due to CSJ).
  • It uses significantly more memory than Praos with 1 peer, but still significantly less memory than Genesis with 30 peers.

This seems to imply that the Genesis memory increase is at least partially due to some non-Genesis-related per-peer leak.

(The .eventlog.html file is too large to open 😿)

Possible next steps

  • Disable ChainSel starvations/ChainSync rotations in the Genesis sync
  • Profile older releases to see if this is the result of a recent change.
  • ghc-debug

@amesgen amesgen moved this from 🏗 In progress to 🔖 Ready in Consensus Team Backlog Jun 5, 2025
@amesgen amesgen removed their assignment Jun 5, 2025
@nfrisby
Copy link
Contributor

nfrisby commented Jun 5, 2025

This trifecta is confusing me.

  • The first plot shows 24-25 gigabytes after 1.2e8 slots.
  • The second plot shows 5 gigabytes versus 1 gigabyte after 1e6 slots.
  • The slope of gap in the second plot is approximately linear.

Would it be easy to add a "live bytes" line to the second plot? I'm wondering if the RSS is initially inflated since maybe the GC isn't under pressure to conserve memory? (That sounds like a silly behavior, but seems worth a check if it's easy to check.)

@amesgen
Copy link
Member Author

amesgen commented Jun 6, 2025

Would it be easy to add a "live bytes" line to the second plot?

Done 👍 The live bytes definitely are rather dispersed for Genesis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Genesis PRs related to Genesis testing and implementation
Projects
Status: 🔖 Ready
Development

No branches or pull requests

2 participants