-
-
Notifications
You must be signed in to change notification settings - Fork 173
Open
Labels
discussionenhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is needednew featureProposing to add a new featureProposing to add a new feature
Description
1. Feature description
Make PyPOTS run models on multi-GPU with DDP (Distributed Data Parallel, https://pytorch.org/tutorials/intermediate/ddp_tutorial.html) or FSDP (Fully Sharded Data Parallel, https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html).
2. Motivation
Current multi-gpu training implemented with torch.nn.DataParallel in PyPOTS framework is not enough for training big models like Time-LLM (e.g. #675 Time-LLM easy OOM on short-len TS samples), we need more advanced feature like DDP or FSDP
3. Your contribution
Would like to lead or arrange the development task. Please leave comments below to start discussions if you're interested. More comments will help prioritize this feature.
Metadata
Metadata
Assignees
Labels
discussionenhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is needednew featureProposing to add a new featureProposing to add a new feature