You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
Fix up the FSDP tutorial to get it functional again.
1. Add missing import for load_dataset.
2. Use `checkpoint` instead of `_shard.checkpoint` to get rid of a
warning.
3. Add nlp to requirements.txt
4. Get rid of `load_metric` as this function does not exist in new
`datasets` module.
5. Add `legacy=False` to get rid of tokenizer warnings.
Test Plan:
Ran the tutorial as follows and ensured that it ran successfully:
```
torchrun --nnodes=1 --nproc_per_node=2 T5_training.py
W1031 09:46:49.166000 2847649 torch/distributed/run.py:793]
W1031 09:46:49.166000 2847649 torch/distributed/run.py:793]
*****************************************
W1031 09:46:49.166000 2847649 torch/distributed/run.py:793] Setting
OMP_NUM_THREADS environment variable for each process to be 1 in
default, to avoid your system being overloaded, please further tune the
variable for optimal performance in your application as needed.
W1031 09:46:49.166000 2847649 torch/distributed/run.py:793]
*****************************************
dict_keys(['train', 'validation', 'test'])
Size of train dataset: (157252, 3)
Size of Validation dataset: (5599, 3)
dict_keys(['train', 'validation', 'test'])
Size of train dataset: (157252, 3)
Size of Validation dataset: (5599, 3)
bFloat16 enabled for mixed precision - using bfSixteen policy
```
0 commit comments