components batch_benchmark_inference_with_inference_compute

Batch Benchmark Inference (With Inference compute Support)

batch_benchmark_inference_with_inference_compute

Overview

Components for batch endpoint inference with inference compute support.

Version: 0.0.7

View in Studio: https://ml.azure.com/registries/azureml/components/batch_benchmark_inference_with_inference_compute/version/0.0.7

Inputs

Name	Description	Type	Default	Optional	Enum
input_dataset	Input jsonl dataset that contains prompt. For the performance test, this one will be neglected.	uri_folder		True
model_type	Type of model's input and output contract. Can be one of ('oai', 'oss', 'vision_oss')	string		False	['oai', 'oss', 'vision_oss']
inference_compute	Compute to be used for inferencing.	string		False
batch_input_pattern	The string for the batch input pattern. The input should be the payload format with substitution for the key for the value put in the `###<key>`. For example, one can use the following format for a llama text-gen model with a input dataset has `prompt` for the payload and `_batch_request_metadata` storing the corresponding ground truth. { "input_data": { "input_string": ["###"], "parameters": { "temperature": 0.6, "max_new_tokens": 100, "do_sample": true } }, "_batch_request_metadata": ###<_batch_request_metadata> } For AOAI chat completion model, the following pattern can be used, { "messages": ###, "temperature": 0.7, "top_p": 0.95, "frequency_penalty": 0, "presence_penalty": 0, "max_tokens": 800, "stop": null }	string		False
endpoint_url	The URL of the endpoint.	string		False
is_performance_test	If true, the performance test will be run and the input dataset will be neglected.	boolean	False
use_tiktoken	If true, `cl100k_base` encoder is used from tiktoken to calculate token count; overrides any other token count calculation.	boolean	False	True
authentication_type	Authentication type for endpoint- azureml_workspace_connection or managed_identity.	string	azureml_workspace_connection	False	['azureml_workspace_connection', 'managed_identity']
deployment_name	The deployment name. Only needed for managed OSS deployment.	string		True
connections_name	Connections name for the endpoint. Only required if authentication_type is "azureml_workspace_connection".	string		True
label_column_name	The label column name.	string		True
additional_columns	The name(s) for additional columns that could be helpful to calculate some metrics, separated by comma (",").	string		True
n_samples	The number of top samples send to endpoint. When performance test is enabled, this will be the number of repeated samples send to the endpoint.	integer		True
handle_response_failure	The way that the formatter handles the failed response. 'use_fallback' will replace them with fallback_value and 'neglect' will drop those rows.	string	use_fallback	False	['use_fallback', 'neglect']
fallback_value	The fallback value that can be used when request payload failed. If not provided, the fallback value will be an empty string.	string		True
min_endpoint_success_ratio	The minimum value of (successful_requests / total_requests) required for classifying inference as successful. If (successful_requests / total_requests) < min_endpoint_success_ratio, the experiment will be marked as failed. By default it is 0. (0 means all requests are allowed to fail while 1 means no request should fail.)	number	0	False
additional_headers	A stringified json expressing additional headers to be added to each request.	string		True
ensure_ascii	If ensure_ascii is true, the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii is false, these characters will be output as-is. More detailed information can be found at https://docs.python.org/3/library/json.html	boolean	False	False
max_retry_time_interval	The maximum time (in seconds) spent retrying a payload. If unspecified, payloads are retried unlimited times.	integer		True
mini_batch_size	The mini batch size for parallel run.	string	100KB	True
endpoint_config_file	The endpoint config file.	uri_file		True
initial_worker_count	The initial number of workers to use for scoring.	integer	5	False
max_worker_count	Overrides initial_worker_count if necessary	integer	200	False
instance_count	Number of nodes in a compute cluster we will run the train step on.	integer	1
max_concurrency_per_instance	Number of processes that will be run concurrently on any given node. This number should not be larger than 1/2 of the number of cores in an individual node in the specified cluster.	integer	1
debug_mode	Enable debug mode will print all the debug logs in the score step.	boolean	False	False
app_insights_connection_string	Application insights connection string where the batch score component will log metrics and logs.	string		True

Outputs

Name	Description	Type
predictions	The prediction data.	uri_file
performance_metadata	The performance data.	uri_file
ground_truth	The ground truth data that has a one-to-one mapping with the prediction data.	uri_file
successful_requests	The successful requests.	uri_file
failed_requests	The failed requests.	uri_file
unsafe_content_blocked_requests	The unsafe requests that were blocked due to Responsible AI concerns.	uri_file

Wiki menu

Home
Reference Documentation
- Components
- Data
- Environments
- Models
Contributing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

components batch_benchmark_inference_with_inference_compute

Batch Benchmark Inference (With Inference compute Support)

batch_benchmark_inference_with_inference_compute

Overview

Inputs

Outputs

Wiki menu

Clone this wiki locally