You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+24
Original file line number
Diff line number
Diff line change
@@ -51,6 +51,30 @@ Then the second command initiates the fine-tuning process using the settings spe
51
51
52
52
The configuration file is the central piece that defines the behavior of the toolkit. It is written in YAML format and consists of several sections that control different aspects of the process, such as data ingestion, model definition, training, inference, and quality assurance. We highlight some of the critical sections.
53
53
54
+
#### Flash Attention 2
55
+
56
+
To enable Flash-attention for [supported models](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2). First install `flash-attn`:
Copy file name to clipboardExpand all lines: llmtune/pydantic_models/config_model.py
+71-11
Original file line number
Diff line number
Diff line change
@@ -77,7 +77,13 @@ class ModelConfig(BaseModel):
77
77
description="Path to the model (huggingface repo or local path)",
78
78
)
79
79
device_map: Optional[str] =Field("auto", description="device onto which to load the model")
80
+
torch_dtype: Optional[str] =Field("auto", description="torch dtype to use for model weights")
81
+
attn_implementation: Optional[str] =Field(
82
+
None,
83
+
description="set desired attention implementation; leave None for default. E.g. `flash_attention_2` (please ensure `torch_dtype` is either float16 or bfloat16).",
84
+
)
80
85
86
+
# Quantization Config
81
87
quantize: Optional[bool] =Field(False, description="Flag to enable quantization")
82
88
bitsandbytes: BitsAndBytesConfig=Field(None, description="Bits and Bytes configuration")
description="Number of updates steps before checkpoint saves. Should be an integer or a float in range [0,1). If smaller than 1, will be interpreted as ratio of total training steps.",
0 commit comments