-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add memory profiling to the benches using DHAT #602
Conversation
I built an alternative approach, where the profiling is enabled per bench config like this: [[benches]]
name = "latency"
upload = 250
upload-delay = [10, 25, 50]
download = 250
download-delay = [10, 25, 50]
upload-size = 1024
download-size = 4096
defer-decryption = true
memory-profile = true It seems to be working better as the overhead of the profiling is not paid for each config. The challenge I have is that during running the first instance of the bench there are 200MB which are being allocated somewhere that I can't see where. This causes the reported peck memory usage to be 350MB for the first run and ~150MB for all the next runs. I am thinking to update the PR with this new approach and then just leave the investigation of these 200MB allocations for later. Technically the memory is being profiled, there is just difference in the allocations between the first and the subsequent runs. Thoughts? |
Awesome! This is something I was going to ask for but wasn't sure of complexity. That one-time allocation is coming from the MPC backend. We do some work upfront then cache it for the lifetime of the program. Simplest solution is to import the
These statics can be wrapped in a newtype and exported with a build method, or some exported fn. If that turns out to be a headache feel free to leave it out of scope of this PR |
I changed my approach to run the memory profiling only on the benches for which it's enabled. This adds manageable overhead when running the benches. Also added preprocessing of the PRF circuits, which made the heap measurements stable. Thanks @sinui0 for the suggestion! Added plotting of the profiling results. Here is the result: |
@valo , thanks for the PR. |
@themighty1 sure. Let me know if I can help with anything. I believe the execution overhead is limited only on the benches where the profiling is enabled, so it should be quite manageable. |
fa7ee02
to
78b9371
Compare
78b9371
to
2042089
Compare
Is this a gh bug? I just force pushed and this PR got closed. |
Now I can reopen it. |
The rebase was tricky since there were too many conflicts, so I just cherry-picked this PR on top of dev. |
I can't seem to reopen this PR. @heeckhau if you are able, please do reopen it and merge it. |
@themighty1 I think the problem is that this PR originates from a clone (https://github.com/valo/tlsn/) |
@valo , could you pls re-open this PR. Seems like only you can do that now that it was accidentally closed. |
subsumed by #658 |
The idea is to run the prover and the verifier through
valgrind --tool=dhat
if a command line option is enabled. This allow to track heap allocations and measure stuff like max memory allocated.The current approach runs the profiling for all bench configs, which seems a bit excessive. Running all benches with memory profiling takes 56min vs 15min without profiling on my machine.
I am thinking to try to move the profiling into the prover/verifier binaries using
dhat-rs
. I am concerned that it might not be possible to turn on/off heap profiling during runtime.Here is the memory output for a full run:
Fixes #462