-
Notifications
You must be signed in to change notification settings - Fork 383
Issues: pytorch/torchtitan
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Question about fixed std=0.02 initialization of Further information is requested
w1
in moe.py
question
#1257
opened Jun 3, 2025 by
trestad
[Deepseek] Router collapse on deepseek training loop
enhancement
New feature or request
#1246
opened May 30, 2025 by
xuanzhang816
How to pretrain from scratch a Qwen 2.5 7B-base model using Torchtitan?
#1223
opened May 25, 2025 by
tjoymeed
float8 rowwise vanilla TP low throughput
bug
Something isn't working
module: float8
#1207
opened May 20, 2025 by
danielvegamyhre
Save RNG states during checkpointing for deterministic debugging
enhancement
New feature or request
#1194
opened May 14, 2025 by
wwwjn
document the usage of environment variables
better_engineering
Repo code quality improvements
documentation
Improvements or additions to documentation
high priority
triage review
#1192
opened May 14, 2025 by
tianyu-l
Can we support outputting checkpoints directly in .pt format?
enhancement
New feature or request
module: checkpoint
#1177
opened May 9, 2025 by
andrewor14
[Question] FSDP+TP CUDA_DEVICE_MAX_CONNECTIONS
documentation
Improvements or additions to documentation
module: fsdp
question
Further information is requested
#1147
opened Apr 27, 2025 by
ChenchaoZhao
fully_shard() for huggingface model: pytorch caches too much GPU memory
module: fsdp
question
Further information is requested
#1126
opened Apr 21, 2025 by
mingdianliu
[DeepSeek MoE] current workstream planning
enhancement
New feature or request
#1125
opened Apr 21, 2025 by
lessw2020
Llama 4 issue tracking
high priority
triage review
#1118
opened Apr 17, 2025 by
tianyu-l
3 of 14 tasks
FSDP2 root level parameter management
module: fsdp
question
Further information is requested
#1091
opened Apr 11, 2025 by
dingqingy
Torch.compile and TP during multiresolution Training
module: torch.compile
question
Further information is requested
#1081
opened Apr 9, 2025 by
nighting0le01
Is the currnet configuration system over-engineered?
question
Further information is requested
#1055
opened Apr 3, 2025 by
wangkuiyi
Clarify PP split point documentation.
question
Further information is requested
#1054
opened Apr 3, 2025 by
githubsgi
Previous Next
ProTip!
Exclude everything labeled
bug
with -label:bug.