[Question] FSDP+TP CUDA_DEVICE_MAX_CONNECTIONS #1147
Labels
documentation
Improvements or additions to documentation
module: fsdp
question
Further information is requested
In Megatron repo https://github.com/NVIDIA/Megatron-LM/blob/4429e8ebe21fb011529d7401c370841ce530785a/megatron/training/arguments.py#L779
It’s recommended that FSDP should use larger values of
CUDA_DEVICE_MAX_CONNECTIONS
but Megatron TP requires it to be 1. Is it also the case for torch implementation of TP using DTensor?How should I configure the environment variable when using torch implementation of FSDP(2) and/or TP/CP/SP?
The text was updated successfully, but these errors were encountered: