Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add memory profiling to the benches using DHAT #602

Closed
wants to merge 0 commits into from

Conversation

valo
Copy link
Contributor

@valo valo commented Sep 24, 2024

The idea is to run the prover and the verifier through valgrind --tool=dhat if a command line option is enabled. This allow to track heap allocations and measure stuff like max memory allocated.

The current approach runs the profiling for all bench configs, which seems a bit excessive. Running all benches with memory profiling takes 56min vs 15min without profiling on my machine.

I am thinking to try to move the profiling into the prover/verifier binaries using dhat-rs. I am concerned that it might not be possible to turn on/off heap profiling during runtime.

Here is the memory output for a full run:

Prover memory usage: 655.8503179550171 Mbytes
Verifier memory usage: 420.8132600784302 Mbytes

Fixes #462

@sinui0 sinui0 self-requested a review September 24, 2024 14:04
@valo
Copy link
Contributor Author

valo commented Sep 25, 2024

I built an alternative approach, where the profiling is enabled per bench config like this:

[[benches]]
name = "latency"
upload = 250
upload-delay = [10, 25, 50]
download = 250
download-delay = [10, 25, 50]
upload-size = 1024
download-size = 4096
defer-decryption = true
memory-profile = true

It seems to be working better as the overhead of the profiling is not paid for each config. The challenge I have is that during running the first instance of the bench there are 200MB which are being allocated somewhere that I can't see where. This causes the reported peck memory usage to be 350MB for the first run and ~150MB for all the next runs.

I am thinking to update the PR with this new approach and then just leave the investigation of these 200MB allocations for later. Technically the memory is being profiled, there is just difference in the allocations between the first and the subsequent runs. Thoughts?

@sinui0
Copy link
Member

sinui0 commented Sep 25, 2024

Awesome! This is something I was going to ask for but wasn't sure of complexity.

That one-time allocation is coming from the MPC backend. We do some work upfront then cache it for the lifetime of the program.

Simplest solution is to import the hmac-sha256 crate and do that work before starting the benches.

static SESSION_KEYS_CIRC: OnceLock<Arc<Circuit>> = OnceLock::new();

These statics can be wrapped in a newtype and exported with a build method, or some exported fn.

If that turns out to be a headache feel free to leave it out of scope of this PR

@valo
Copy link
Contributor Author

valo commented Oct 3, 2024

I changed my approach to run the memory profiling only on the benches for which it's enabled. This adds manageable overhead when running the benches.

Also added preprocessing of the PRF circuits, which made the heap measurements stable. Thanks @sinui0 for the suggestion!

Added plotting of the profiling results. Here is the result:
Download Size vs Memory

@themighty1
Copy link
Member

@valo , thanks for the PR.
If you dont mind, I'll take over this PR to rebase it and add a few other things.
I'm afraid that the slowdown caused by dhat may skew the results of the other benches, so I plan to have 2 binaries - one with dhat just for the memory bench and one without for all other benches.

@valo
Copy link
Contributor Author

valo commented Oct 16, 2024

@themighty1 sure. Let me know if I can help with anything. I believe the execution overhead is limited only on the benches where the profiling is enabled, so it should be quite manageable.

@themighty1
Copy link
Member

Is this a gh bug? I just force pushed and this PR got closed.

@themighty1
Copy link
Member

Now I can reopen it.

@themighty1
Copy link
Member

The rebase was tricky since there were too many conflicts, so I just cherry-picked this PR on top of dev.
The bench suite already ran successfully a few times here https://github.com/themighty1/tlsn/actions
(the memory-profiling branch)
tagging @heeckhau for a final review

@themighty1
Copy link
Member

I can't seem to reopen this PR. @heeckhau if you are able, please do reopen it and merge it.

@heeckhau
Copy link
Member

@themighty1 I think the problem is that this PR originates from a clone (https://github.com/valo/tlsn/)
It might be simpler to open a new PR?

@themighty1
Copy link
Member

@valo , could you pls re-open this PR. Seems like only you can do that now that it was accidentally closed.

@themighty1 themighty1 mentioned this pull request Oct 28, 2024
@themighty1
Copy link
Member

subsumed by #658

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants