Skip to content

Commit

Permalink
Remove deprecated Mixed precision flags (#1471)
Browse files Browse the repository at this point in the history
Change-Id: I1c2e2460dc2072ba7b311f239441b304694918c8
  • Loading branch information
vivekgoe authored Nov 12, 2024
1 parent 7ac1db1 commit 6fcff50
Show file tree
Hide file tree
Showing 5 changed files with 15 additions and 13 deletions.
4 changes: 2 additions & 2 deletions docs/source/package_reference/gaudi_config.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ Here is a description of each configuration parameter:
- `use_fused_adam` enables to decide whether to use the [custom fused implementation of the ADAM optimizer provided by Intel® Gaudi® AI Accelerator](https://docs.habana.ai/en/latest/PyTorch/Model_Optimization_PyTorch/Custom_Ops_PyTorch.html#custom-optimizers).
- `use_fused_clip_norm` enables to decide whether to use the [custom fused implementation of gradient norm clipping provided by Intel® Gaudi® AI Accelerator](https://docs.habana.ai/en/latest/PyTorch/Model_Optimization_PyTorch/Custom_Ops_PyTorch.html#other-custom-ops).
- `use_torch_autocast` enables PyTorch autocast; used to define good pre-defined config; users should favor `--bf16` training argument
- `autocast_bf16_ops` list of operations that should be run with bf16 precision under autocast context; using environment flag LOWER_LIST is a preffered way for operator autocast list override
- `autocast_fp32_ops` list of operations that should be run with fp32 precision under autocast context; using environment flag FP32_LIST is a preffered way for operator autocast list override
- `autocast_bf16_ops` list of operations that should be run with bf16 precision under autocast context; using environment flag PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST is a preffered way for operator autocast list override
- `autocast_fp32_ops` list of operations that should be run with fp32 precision under autocast context; using environment flag PT_HPU_AUTOCAST_FP32_OPS_LIST is a preffered way for operator autocast list override


You can find examples of Gaudi configurations in the [Habana model repository on the Hugging Face Hub](https://huggingface.co/habana). For instance, [for BERT Large we have](https://huggingface.co/Habana/bert-large-uncased-whole-word-masking/blob/main/gaudi_config.json):
Expand Down
10 changes: 5 additions & 5 deletions examples/language-modeling/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -404,7 +404,7 @@ python3 run_lora_clm.py \
```
- Single-card finetuning of Falcon-40B:
```bash
LOWER_LIST=ops_bf16.txt python3 run_lora_clm.py \
PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST=ops_bf16.txt python3 run_lora_clm.py \
--model_name_or_path tiiuae/falcon-40b \
--dataset_name timdettmers/openassistant-guanaco \
--bf16 True \
Expand Down Expand Up @@ -474,7 +474,7 @@ python ../gaudi_spawn.py \

- Multi-card finetuning of Llama2-7B with FP8:
```bash
LOWER_LIST=ops_bf16.txt python ../gaudi_spawn.py \
PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST=ops_bf16.txt python ../gaudi_spawn.py \
--world_size 8 --use_mpi run_lora_clm.py \
--model_name_or_path meta-llama/Llama-2-7b-hf \
--dataset_name tatsu-lab/alpaca \
Expand Down Expand Up @@ -569,7 +569,7 @@ python ../gaudi_spawn.py \

- Multi-card finetuning of Falcon-40B:
```bash
LOWER_LIST=ops_bf16.txt python3 ../gaudi_spawn.py \
PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST=ops_bf16.txt python3 ../gaudi_spawn.py \
--world_size 8 --use_mpi run_lora_clm.py \
--model_name_or_path tiiuae/falcon-40b \
--dataset_name timdettmers/openassistant-guanaco \
Expand Down Expand Up @@ -647,7 +647,7 @@ python3 ../gaudi_spawn.py --use_deepspeed --world_size 8 run_lora_clm.py \
- Multi-card finetuning of Llama2-70B with FSDP and LoRA:

```bash
LOWER_LIST=ops_bf16.txt PT_HPU_LAZY_MODE=0 \
PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST=ops_bf16.txt PT_HPU_LAZY_MODE=0 \
python3 ../gaudi_spawn.py --world_size 8 --use_mpi run_lora_clm.py \
--model_name_or_path meta-llama/Llama-2-70b-hf \
--dataset_name tatsu-lab/alpaca \
Expand Down Expand Up @@ -690,7 +690,7 @@ python3 ../gaudi_spawn.py --world_size 8 --use_mpi run_lora_clm.py \
- Falcon-180B example command saves only the LoRA parameters at end
- For inference we need to merge the pretrained model and LoRA weights
```bash
DEEPSPEED_HPU_ZERO3_SYNC_MARK_STEP_REQUIRED=1 LOWER_LIST=ops_bf16.txt python3 ../gaudi_spawn.py \
PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST=ops_bf16.txt python3 ../gaudi_spawn.py \
--world_size 8 --use_deepspeed run_lora_clm.py \
--model_name_or_path tiiuae/falcon-180B \
--dataset_name timdettmers/openassistant-guanaco \
Expand Down
4 changes: 2 additions & 2 deletions optimum/habana/transformers/gaudi_configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,5 +93,5 @@ def declare_autocast_bf16_fp32_ops(self):
autocast_bf16_filename,
autocast_fp32_filename,
)
os.environ["LOWER_LIST"] = autocast_bf16_filename
os.environ["FP32_LIST"] = autocast_fp32_filename
os.environ["PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST"] = autocast_bf16_filename
os.environ["PT_HPU_AUTOCAST_FP32_OPS_LIST"] = autocast_fp32_filename
2 changes: 1 addition & 1 deletion optimum/habana/transformers/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,7 @@ def __init__(
"The argument `--bf16` was not given but `use_torch_autocast` is True in the Gaudi configuration so mixed-precision training with Torch Autocast is enabled."
)

if self.use_hpu_amp and "LOWER_LIST" not in os.environ:
if self.use_hpu_amp and "PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST" not in os.environ:
self.gaudi_config.declare_autocast_bf16_fp32_ops()

if self.args.use_lazy_mode:
Expand Down
8 changes: 5 additions & 3 deletions tests/test_examples.py
Original file line number Diff line number Diff line change
Expand Up @@ -449,7 +449,7 @@ def test(self):

env_variables = os.environ.copy()
if "falcon" in model_name:
env_variables["LOWER_LIST"] = str(example_script.parent / "ops_bf16.txt")
env_variables["PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST"] = str(example_script.parent / "ops_bf16.txt")
elif "flan" in model_name:
env_variables["PT_HPU_MAX_COMPOUND_OP_SIZE"] = "512"
elif "bloom" in model_name:
Expand All @@ -460,13 +460,15 @@ def test(self):
env_variables["DEEPSPEED_HPU_ZERO3_SYNC_MARK_STEP_REQUIRED"] = "1"
elif fsdp:
if "llama" in model_name:
env_variables["LOWER_LIST"] = str(example_script.parent / "ops_bf16.txt")
env_variables["PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST"] = str(
example_script.parent / "ops_bf16.txt"
)
env_variables["PT_HPU_LAZY_MODE"] = "0"
elif deepspeed and "gpt-neox-20b" in model_name:
env_variables["LD_PRELOAD"] = ""

if fp8 and "llama" in model_name:
env_variables["LOWER_LIST"] = str(example_script.parent / "ops_bf16.txt")
env_variables["PT_HPU_AUTOCAST_LOWER_PRECISION_OPS_LIST"] = str(example_script.parent / "ops_bf16.txt")

extra_command_line_arguments = baseline.get("distribution").get(distribution).get("extra_arguments", [])

Expand Down

0 comments on commit 6fcff50

Please sign in to comment.