Skip to content

Commit

Permalink
Merge branch 'main' into sekyondaMeta-Deadlink-update-1
Browse files Browse the repository at this point in the history
  • Loading branch information
svekars authored Nov 15, 2024
2 parents 93d9713 + 69475d6 commit c76db51
Show file tree
Hide file tree
Showing 7 changed files with 816 additions and 5 deletions.
5 changes: 4 additions & 1 deletion .jenkins/metadata.json
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
},
"recipes_source/torch_export_aoti_python.py": {
"needs": "linux.g5.4xlarge.nvidia.gpu"
},
},
"advanced_source/pendulum.py": {
"needs": "linux.g5.4xlarge.nvidia.gpu",
"_comment": "need to be here for the compiling_optimizer_lr_scheduler.py to run."
Expand All @@ -58,6 +58,9 @@
"intermediate_source/scaled_dot_product_attention_tutorial.py": {
"needs": "linux.g5.4xlarge.nvidia.gpu"
},
"intermediate_source/transformer_building_blocks.py": {
"needs": "linux.g5.4xlarge.nvidia.gpu"
},
"recipes_source/torch_compile_user_defined_triton_kernel_tutorial.py": {
"needs": "linux.g5.4xlarge.nvidia.gpu"
},
Expand Down
1 change: 1 addition & 0 deletions .jenkins/validate_tutorials_built.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
"intermediate_source/mnist_train_nas", # used by ax_multiobjective_nas_tutorial.py
"intermediate_source/fx_conv_bn_fuser",
"intermediate_source/_torch_export_nightly_tutorial", # does not work on release
"intermediate_source/transformer_building_blocks", # does not work on release
"advanced_source/super_resolution_with_onnxruntime",
"advanced_source/usb_semisup_learn", # fails with CUDA OOM error, should try on a different worker
"prototype_source/fx_graph_mode_ptq_dynamic",
Expand Down
2 changes: 1 addition & 1 deletion beginner_source/ddp_series_intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
Distributed Data Parallel in PyTorch - Video Tutorials
======================================================

Authors: `Suraj Subramanian <https://github.com/suraj813>`__
Authors: `Suraj Subramanian <https://github.com/subramen>`__

Follow along with the video below or on `youtube <https://www.youtube.com/watch/-K3bZYHYHEA>`__.

Expand Down
19 changes: 19 additions & 0 deletions en-wordlist.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
ACL
ADI
ALiBi
AOT
AOTInductor
APIs
Expand Down Expand Up @@ -79,6 +80,7 @@ FX
FX's
FairSeq
Fastpath
FFN
FloydHub
FloydHub's
Frobenius
Expand Down Expand Up @@ -127,6 +129,7 @@ Kihyuk
Kiuk
Kubernetes
Kuei
KV
LRSchedulers
LSTM
LSTMs
Expand Down Expand Up @@ -162,6 +165,7 @@ NLP
NTK
NUMA
NaN
NaNs
NanoGPT
Netron
NeurIPS
Expand Down Expand Up @@ -231,6 +235,7 @@ Sigmoid
SoTA
Sohn
Spacy
SwiGLU
TCP
THP
TIAToolbox
Expand Down Expand Up @@ -276,6 +281,7 @@ Xcode
Xeon
Yidong
YouTube
Zipf
accelerometer
accuracies
activations
Expand Down Expand Up @@ -305,6 +311,7 @@ bbAP
benchmarked
benchmarking
bitwise
bool
boolean
breakpoint
broadcasted
Expand Down Expand Up @@ -333,6 +340,7 @@ csv
cuDNN
cuda
customizable
customizations
datafile
dataflow
dataframe
Expand Down Expand Up @@ -377,6 +385,7 @@ fbgemm
feedforward
finetune
finetuning
FlexAttention
fp
frontend
functionalized
Expand Down Expand Up @@ -431,6 +440,7 @@ mAP
macos
manualSeed
matmul
matmuls
matplotlib
memcpy
memset
Expand All @@ -446,6 +456,7 @@ modularized
mpp
mucosa
multihead
MultiheadAttention
multimodal
multimodality
multinode
Expand All @@ -456,7 +467,11 @@ multithreading
namespace
natively
ndarrays
nheads
nightlies
NJT
NJTs
NJT's
num
numericalize
numpy
Expand Down Expand Up @@ -532,6 +547,7 @@ runtime
runtime
runtimes
scalable
SDPA
sharded
softmax
sparsified
Expand Down Expand Up @@ -591,12 +607,14 @@ tradeoff
tradeoffs
triton
uint
UX
umap
uncomment
uncommented
underflowing
unfused
unimodal
unigram
unnormalized
unoptimized
unparametrized
Expand All @@ -618,6 +636,7 @@ warmstarted
warmstarting
warmup
webp
wikitext
wsi
wsis
Meta's
Expand Down
8 changes: 8 additions & 0 deletions index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -664,6 +664,14 @@ Welcome to PyTorch Tutorials
:link: beginner/knowledge_distillation_tutorial.html
:tags: Model-Optimization,Image/Video


.. customcarditem::
:header: Accelerating PyTorch Transformers by replacing nn.Transformer with Nested Tensors and torch.compile()
:card_description: This tutorial goes over recommended best practices for implementing Transformers with native PyTorch.
:image: _static/img/thumbnails/cropped/pytorch-logo.png
:link: intermediate/transformer_building_blocks.html
:tags: Transformer

.. Parallel-and-Distributed-Training
Expand Down
5 changes: 2 additions & 3 deletions intermediate_source/process_group_cpp_extension_tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,8 @@ Basics

PyTorch collective communications power several widely adopted distributed
training features, including
`DistributedDataParallel <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html>`__,
`ZeroRedundancyOptimizer <https://pytorch.org/docs/stable/distributed.optim.html#torch.distributed.optim.ZeroRedundancyOptimizer>`__,
`FullyShardedDataParallel <https://github.com/pytorch/pytorch/blob/master/torch/distributed/_fsdp/fully_sharded_data_parallel.py>`__.
`DistributedDataParallel <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html>`__ and
`ZeroRedundancyOptimizer <https://pytorch.org/docs/stable/distributed.optim.html#torch.distributed.optim.ZeroRedundancyOptimizer>`__.
In order to make the same collective communication API work with
different communication backends, the distributed package abstracts collective
communication operations into a
Expand Down
Loading

0 comments on commit c76db51

Please sign in to comment.