-
Notifications
You must be signed in to change notification settings - Fork 124
components chat_completion_pipeline
github-actions[bot] edited this page Oct 31, 2024
·
22 revisions
Pipeline Component to finetune Hugging Face pretrained models for chat completion task. The component supports optimizations such as LoRA, Deepspeed and ONNXRuntime for performance enhancement. See docs to learn more.
Version: 0.0.65
View in Studio: https://ml.azure.com/registries/azureml/components/chat_completion_pipeline/version/0.0.65
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
instance_type_model_import | Instance type to be used for model_import component in case of serverless compute, eg. standard_d12_v2. The parameter compute_model_import must be set to 'serverless' for instance_type to be used | string | Standard_d12_v2 | True | |
instance_type_preprocess | Instance type to be used for preprocess component in case of serverless compute, eg. standard_d12_v2. The parameter compute_preprocess must be set to 'serverless' for instance_type to be used | string | Standard_d12_v2 | True | |
instance_type_finetune | Instance type to be used for finetune component in case of serverless compute, eg. standard_nc24rs_v3. The parameter compute_finetune must be set to 'serverless' for instance_type to be used | string | Standard_nc24rs_v3 | True | |
instance_type_model_evaluation | Instance type to be used for model_evaluation components in case of serverless compute, eg. standard_nc24rs_v3. The parameter compute_model_evaluation must be set to 'serverless' for instance_type to be used | string | Standard_nc24rs_v3 | True | |
shm_size_finetune | Shared memory size to be used for finetune component. It is useful while using Nebula (via DeepSpeed) which uses shared memory to save model and optimizer states. | string | 5g | True | |
num_nodes_finetune | number of nodes to be used for finetuning (used for distributed training) | integer | 1 | True | |
number_of_gpu_to_use_finetuning | number of gpus to be used per node for finetuning, should be equal to number of gpu per node in the compute SKU used for finetune | integer | 1 | True |
Model Import parameters (See docs to learn more)
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
huggingface_id | The string can be any valid Hugging Face id from the Hugging Face models webpage. Models from Hugging Face are subject to third party license terms available on the Hugging Face model details page. It is your responsibility to comply with the model's license terms. | string | True | ||
pytorch_model_path | Pytorch model asset path. Special characters like \ and ' are invalid in the parameter value. | custom_model | True | ||
mlflow_model_path | MLflow model asset path. Special characters like \ and ' are invalid in the parameter value. | mlflow_model | True |
Data PreProcess parameters (See docs to learn more)
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
task_name | ChatCompletion task type | string | ChatCompletion | False | ['ChatCompletion'] |
batch_size | Number of examples to batch before calling the tokenization function | integer | 1000 | True | |
pad_to_max_length | If set to True, the returned sequences will be padded according to the model's padding side and padding index, up to their max_seq_length . If no max_seq_length is specified, the padding is done up to the model's max length. |
string | false | True | ['true', 'false'] |
max_seq_length | Controls the maximum length to use when pad_to_max_length parameter is set to true . Default is -1 which means the padding is done up to the model's max length. Else will be padded to max_seq_length . |
integer | -1 | True | |
train_file_path | Path to the registered training data asset. The supported data formats are jsonl , json , csv , tsv and parquet . Special characters like \ and ' are invalid in the parameter value. |
uri_file | True | ||
validation_file_path | Path to the registered validation data asset. The supported data formats are jsonl , json , csv , tsv and parquet . Special characters like \ and ' are invalid in the parameter value. |
uri_file | True | ||
test_file_path | Path to the registered test data asset. The supported data formats are jsonl , json , csv , tsv and parquet . Special characters like \ and ' are invalid in the parameter value. |
uri_file | True | ||
train_mltable_path | Path to the registered training data asset in mltable format. Special characters like \ and ' are invalid in the parameter value. |
mltable | True | ||
validation_mltable_path | Path to the registered validation data asset in mltable format. Special characters like \ and ' are invalid in the parameter value. |
mltable | True | ||
test_mltable_path | Path to the registered test data asset in mltable format. Special characters like \ and ' are invalid in the parameter value. |
mltable | True |
Finetune parameters (See docs to learn more)
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
apply_lora | If "true" enables lora. | string | false | True | ['true', 'false'] |
merge_lora_weights | If "true", the lora weights are merged with the base Hugging Face model weights before saving. | string | true | True | ['true', 'false'] |
lora_alpha | alpha attention parameter for lora. | integer | 128 | True | |
lora_r | lora dimension | integer | 8 | True | |
lora_dropout | lora dropout value | number | 0.0 | True | |
num_train_epochs | Number of epochs to run for finetune. | integer | 1 | True | |
max_steps | If set to a positive number, the total number of training steps to perform. Overrides 'epochs'. In case of using a finite iterable dataset the training may stop before reaching the set number of steps when all data is exhausted. | integer | -1 | True | |
per_device_train_batch_size | Per gpu batch size used for training. The effective training batch size is per_device_train_batch_size * num_gpus * num_nodes. | integer | 1 | True | |
per_device_eval_batch_size | Per gpu batch size used for validation. The default value is 1. The effective validation batch size is per_device_eval_batch_size * num_gpus * num_nodes. | integer | 1 | True | |
auto_find_batch_size | If set to "true" and if the provided 'per_device_train_batch_size' goes into Out Of Memory (OOM) auto_find_batch_size will find the correct batch size by iteratively reducing batch size by a factor of 2 till the OOM is fixed | string | false | True | ['true', 'false'] |
optim | Optimizer to be used while training | string | adamw_hf | True | ['adamw_hf', 'adamw_torch', 'adafactor'] |
learning_rate | Start learning rate used for training. | number | 2e-05 | True | |
warmup_steps | Number of steps for the learning rate scheduler warmup phase. | integer | 0 | True | |
weight_decay | Weight decay to apply (if not zero) to all layers except all bias and LayerNorm weights in AdamW optimizer | number | 0.0 | True | |
adam_beta1 | beta1 hyperparameter for the AdamW optimizer | number | 0.9 | True | |
adam_beta2 | beta2 hyperparameter for the AdamW optimizer | number | 0.999 | True | |
adam_epsilon | epsilon hyperparameter for the AdamW optimizer | number | 1e-08 | True | |
gradient_accumulation_steps | Number of updates steps to accumulate the gradients for, before performing a backward/update pass | integer | 1 | True | |
eval_accumulation_steps | Number of predictions steps to accumulate before moving the tensors to the CPU, will be passed as None if set to -1 | integer | -1 | True | |
lr_scheduler_type | learning rate scheduler to use. | string | linear | True | ['linear', 'cosine', 'cosine_with_restarts', 'polynomial', 'constant', 'constant_with_warmup'] |
precision | Apply mixed precision training. This can reduce memory footprint by performing operations in half-precision. | string | 32 | True | ['32', '16'] |
seed | Random seed that will be set at the beginning of training | integer | 42 | True | |
enable_full_determinism | Ensure reproducible behavior during distributed training. Check this link https://pytorch.org/docs/stable/notes/randomness.html for more details. | string | false | True | ['true', 'false'] |
dataloader_num_workers | Number of subprocesses to use for data loading. 0 means that the data will be loaded in the main process. | integer | 0 | True | |
ignore_mismatched_sizes | Not setting this flag will raise an error if some of the weights from the checkpoint do not have the same size as the weights of the model. | string | false | True | ['true', 'false'] |
max_grad_norm | Maximum gradient norm (for gradient clipping) | number | 1.0 | True | |
evaluation_strategy | The evaluation strategy to adopt during training. If set to "steps", either the evaluation_steps_interval or eval_steps needs to be specified, which helps to determine the step at which the model evaluation needs to be computed else evaluation happens at end of each epoch. |
string | epoch | True | ['epoch', 'steps'] |
evaluation_steps_interval | The evaluation steps in fraction of an epoch steps to adopt during training. Overwrites eval_steps if not 0. | number | 0.0 | True | |
eval_steps | Number of update steps between two evals if evaluation_strategy='steps' | integer | 500 | True | |
logging_strategy | The logging strategy to adopt during training. If set to "steps", the logging_steps will decide the frequency of logging else logging happens at the end of epoch.. |
string | steps | True | ['epoch', 'steps'] |
logging_steps | Number of update steps between two logs if logging_strategy='steps' | integer | 10 | True | |
metric_for_best_model | metric to use to compare two different model checkpoints | string | loss | True | ['loss', 'f1', 'exact'] |
resume_from_checkpoint | If set to "true", resumes the training from last saved checkpoint. Along with loading the saved weights, saved optimizer, scheduler and random states will be loaded if exist. The default value is "false" | string | false | True | ['true', 'false'] |
save_total_limit | If a positive value is passed, it will limit the total number of checkpoints saved. The value of -1 saves all the checkpoints, otherwise if the number of checkpoints exceed the save_total_limit, the older checkpoints gets deleted. | integer | -1 | True | |
apply_early_stopping | If set to "true", early stopping is enabled. | string | false | True | ['true', 'false'] |
early_stopping_patience | Stop training when the metric specified through metric_for_best_model worsens for early_stopping_patience evaluation calls.This value is only valid if apply_early_stopping is set to true. | integer | 1 | True | |
early_stopping_threshold | Denotes how much the specified metric must improve to satisfy early stopping conditions. This value is only valid if apply_early_stopping is set to true. | number | 0.0 | True | |
apply_deepspeed | If set to true, will enable deepspeed for training | string | false | True | ['true', 'false'] |
deepspeed | Deepspeed config to be used for finetuning. Special characters like \ and ' are invalid in the parameter value. | uri_file | True | ||
deepspeed_stage | This parameter configures which DEFAULT deepspeed config to be used - stage2 or stage3. The default choice is stage2. Note that, this parameter is ONLY applicable when user doesn't pass any config information via deepspeed port. | string | 2 | True | ['2', '3'] |
apply_ort | If set to true, will use the ONNXRunTime training | string | false | True | ['true', 'false'] |
Model Evaluation parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
evaluation_config | Additional parameters for Computing Metrics. Special characters like \ and ' are invalid in the parameter value. | uri_file | True | ||
evaluation_config_params | Additional parameters as JSON serielized string | string | True |
Compute parameters
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
compute_model_import | compute to be used for model_import eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'. Special characters like \ and ' are invalid in the parameter value. If compute cluster name is provided, instance_type field will be ignored and the respective cluster will be used | string | serverless | True | |
compute_preprocess | compute to be used for preprocess eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'. Special characters like \ and ' are invalid in the parameter value. If compute cluster name is provided, instance_type field will be ignored and the respective cluster will be used | string | serverless | True | |
compute_finetune | compute to be used for finetune eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'. Special characters like \ and ' are invalid in the parameter value. If compute cluster name is provided, instance_type field will be ignored and the respective cluster will be used | string | serverless | True | |
compute_model_evaluation | compute to be used for model_eavaluation eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'. Special characters like \ and ' are invalid in the parameter value. If compute cluster name is provided, instance_type field will be ignored and the respective cluster will be used | string | serverless | True |
Name | Description | Type |
---|---|---|
pytorch_model_folder | output folder containing best model as defined by metric_for_best_model. Along with the best model, output folder contains checkpoints saved after every evaluation which is defined by the evaluation_strategy. Each checkpoint contains the model weight(s), config, tokenizer, optimzer, scheduler and random number states. | uri_folder |
mlflow_model_folder | output folder containing best finetuned model in mlflow format. | mlflow_model |
evaluation_result: type: uri_folder description: Test Data Evaluation Results
Name | Description | Type |
---|