-
Notifications
You must be signed in to change notification settings - Fork 165
components nlp_multilabel_datapreprocessing
github-actions[bot] edited this page Sep 16, 2025
·
13 revisions
Component to preprocess data for automl nlp multilabel classification task
Version: 0.0.79
View in Studio: https://ml.azure.com/registries/azureml/components/nlp_multilabel_datapreprocessing/version/0.0.79
Sequence Classification task arguments
| Name | Description | Type | Default | Optional | Enum |
|---|---|---|---|---|---|
| label_column_name | label column name | string | False | ||
| batch_size | Number of examples to batch before calling the tokenization function | integer | 32 | True |
Inputs
| Name | Description | Type | Default | Optional | Enum |
|---|---|---|---|---|---|
| train_file_path | Enter the train file path | uri_file | False | ||
| valid_file_path | Enter the validation file path | uri_file | False |
Dataset parameters
| Name | Description | Type | Default | Optional | Enum |
|---|---|---|---|---|---|
| model_selector_output | output folder of model selector containing model metadata like config, checkpoints, tokenizer config | uri_folder | False |
AutoML NLP parameters
| Name | Description | Type | Default | Optional | Enum |
|---|---|---|---|---|---|
| enable_long_range_text | label key name | boolean | True | True |
| Name | Description | Type |
|---|---|---|
| output_dir | folder to store preprocessed outputs of input data | uri_folder |
azureml://registries/azureml/environments/acft-hf-nlp-gpu/versions/105