-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPICH high memory footprint #7199
Comments
Note that above data is with PPN 96. The reported memory footprint values are in GB per socket. There is linear increase in memory overhead and it is persisting through entire program execution. |
@aditya-nishtala Could you retry the experiment using a debug-build and enable |
Taking the difference, the memory increase are roughly linear to the number of nodes, ~55-65 MB/Node. @aditya-nishtala How many PPN (process per node)? |
This is with PPN 96. |
Thanks @nsdhaman . So that is roughly 6KB per connection. |
Okay, I think the issue is we are allocating too much address table prepared for all possible connections. If we assume no application will use multiple VCI, we could configure with For more appropriate fix, we could change the av table accommodate multi-VCI/NIC entries dynamically rather than statically. I probably can implement something like that. |
We ran a simple hello world mpich program where each rank prints the rank id + hostname its running on.
The program allocates no memory at all, all of the memory allocation comes from whatever MPICH is doing.
We scaled the from 32 nodes to 768 nodes and measured how much memory is being consumed.
MPICH commit tag is 204f8cd
This is happening on Aurora
Memory Consumption is equivalent whether using DDR or HBM. Below data is measured on DDR.
The text was updated successfully, but these errors were encountered: