components oss_distillation_seq_scoring_pipeline

OSS Distillation Sequence Scoring Pipeline

oss_distillation_seq_scoring_pipeline

Overview

Component to generate data from teacher model enpoint(sequentially) and finetune student model on generated dataset

Version: 0.0.1

View in Studio: https://ml.azure.com/registries/azureml/components/oss_distillation_seq_scoring_pipeline/version/0.0.1

Inputs

Compute parameters

Name	Description	Type	Default	Optional
instance_type_pipeline_validation	Instance type to be used for validation component. The parameter compute_pipeline_validation must be set to 'serverless' for instance_type to be used.	string		True
instance_type_data_generation	Instance type to be used for finetune component in case of virtual cluster compute, eg. Singularity.ND40_v2. The parameter compute_finetune must be set to 'serverless' for instance_type to be used	string	Standard_D4as_v4	True
instance_type_data_import	Instance type to be used for data_import component in case of virtual cluster compute, eg. Singularity.D8_v3. The parameter compute_data_import must be set to 'serverless' for instance_type to be used	string	Singularity.ND96amrs_A100_v4	True
instance_type_finetune	Instance type to be used for finetune component in case of virtual cluster compute, eg. Singularity.ND40_v2. The parameter compute_finetune must be set to 'serverless' for instance_type to be used	string	Singularity.ND96amrs_A100_v4	True
compute_pipeline_validation	compute to be used for validation component	string	serverless	True
compute_data_generation	compute to be used for model_import eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'. Special characters like \ and ' are invalid in the parameter value. If compute cluster name is provided, instance_type field will be ignored and the respective cluster will be used	string	serverless	True
compute_data_import	compute to be used for model_import eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'. Special characters like \ and ' are invalid in the parameter value. If compute cluster name is provided, instance_type field will be ignored and the respective cluster will be used	string	serverless	True
compute_finetune	compute to be used for finetune eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'. Special characters like \ and ' are invalid in the parameter value. If compute cluster name is provided, instance_type field will be ignored and the respective cluster will be used	string	serverless	True

Data Generator Component

Name	Description	Type	Default	Optional	Enum
train_file_path	Path to the registered training data asset. The supported data formats are `jsonl`, `json`, `csv`, `tsv` and `parquet`.	uri_file
validation_file_path	Path to the registered validation data asset. The supported data formats are `jsonl`, `json`, `csv`, `tsv` and `parquet`.	uri_file		True
validation_info	Validation status.	uri_file		True
teacher_model_endpoint_name	Teacher model endpoint name	string		True
teacher_model_endpoint_url	Teacher model endpoint URL	string		True
teacher_model_endpoint_key	Teacher model endpoint key	string		True
teacher_model_max_new_tokens	Teacher model max_new_tokens inference parameter	integer	128
teacher_model_temperature	Teacher model temperature inference parameter	number	0.2
teacher_model_top_p	Teacher model top_p inference parameter	number	0.1
teacher_model_frequency_penalty	Teacher model frequency penalty inference parameter	number	0.0
teacher_model_presence_penalty	Teacher model presence penalty inference parameter	number	0.0
teacher_model_stop	Teacher model stop inference parameter	string		True
request_batch_size	No of data records to hit teacher model endpoint in one go	integer	10
min_endpoint_success_ratio	The minimum value of (successful_requests / total_requests) required for classifying inference as successful. If (successful_requests / total_requests) < min_endpoint_success_ratio, the experiment will be marked as failed. By default it is 0.7 (0 means all requests are allowed to fail while 1 means no request should fail.)	number	0.7
enable_chain_of_thought	Enable Chain of thought for data generation	string	false	True
enable_chain_of_density	Enable Chain of density for text summarization	string	false	True
max_len_summary	Maximum Length Summary for text summarization	integer	80	True
data_generation_task_type	Data generation task type. Supported values are: 1. NLI: Generate Natural Language Inference data 2. CONVERSATION: Generate conversational data (multi/single turn) 3. NLU_QA: Generate Natural Language Understanding data for Question Answering data 4. MATH: Generate Math data for numerical responses 5. SUMMARIZATION: Generate Key Summary for an Article	string			['NLI', 'CONVERSATION', 'NLU_QA', 'MATH', 'SUMMARIZATION']
model_asset_id	The student model asset id	string		False

Training parameters

Name	Description	Type	Default	Optional
num_train_epochs	training epochs	integer	1	True
per_device_train_batch_size	Train batch size	integer	1	True
learning_rate	Start learning rate.	number	0.0003	True

Output of validation component.

Name	Description	Type	Default	Optional	Enum
validation_output	Validation status.	uri_file		True

Outputs

Name	Description	Type
generated_train_file_path	Generated train data	uri_file
generated_validation_file_path	Generated validation data	uri_file

Wiki menu

Home
Reference Documentation
- Components
- Data
- Environments
- Models
Contributing

components oss_distillation_seq_scoring_pipeline

OSS Distillation Sequence Scoring Pipeline

oss_distillation_seq_scoring_pipeline

Overview

Inputs

Outputs

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!