Skip to content

[Question] FSDP+TP CUDA_DEVICE_MAX_CONNECTIONS #1147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ChenchaoZhao opened this issue Apr 27, 2025 · 3 comments
Open

[Question] FSDP+TP CUDA_DEVICE_MAX_CONNECTIONS #1147

ChenchaoZhao opened this issue Apr 27, 2025 · 3 comments
Labels
documentation Improvements or additions to documentation module: fsdp question Further information is requested

Comments

@ChenchaoZhao
Copy link

In Megatron repo https://github.com/NVIDIA/Megatron-LM/blob/4429e8ebe21fb011529d7401c370841ce530785a/megatron/training/arguments.py#L779

It’s recommended that FSDP should use larger values of CUDA_DEVICE_MAX_CONNECTIONS but Megatron TP requires it to be 1. Is it also the case for torch implementation of TP using DTensor?

How should I configure the environment variable when using torch implementation of FSDP(2) and/or TP/CP/SP?

@fegin
Copy link
Contributor

fegin commented Apr 29, 2025

@weifengpy Do you have insights on this?

@weifengpy
Copy link
Contributor

@ChenchaoZhao @fegin for FSDP2 + torch native TP, we recommend setting CUDA_DEVICE_MAX_CONNECTIONS to number of cuda streams. for example, 16 or 32. This makes sure compute and nccl kernels can execute in parallel

@tianyu-l tianyu-l added documentation Improvements or additions to documentation module: fsdp question Further information is requested labels Apr 29, 2025
@ChenchaoZhao
Copy link
Author

Thanks for the quick answer. Does it mean that PyTorch native TP is superior to the Megatron TP which requires the variable to be 1 in order to turn on tp comm overlap (comm+GEMM)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation module: fsdp question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants