components llm_rag_qa_data_generation

LLM - Generate QnA Test Data

llm_rag_qa_data_generation

Overview

Generates a test dataset of questions and answers based on the input documents.

A chunk of text is read from each input document and sent to the specified LLM with a prompt to create a question and answer based on that text. These question, answer, and context sets are saved as either a csv or jsonl file. Short-answer, long-answer, summary, and boolean-based QAs are generated.

Version: 0.0.83

Inputs

Name	Description	Type	Default	Optional
openai_api_version	Version of OpenAI API to use for communicating with LLM.	string	2023-03-15-preview
openai_api_type	Type of OpenAI endpoint hosting model. Defaults to azure for AOAI endpoints.	string	azure
input_data	Uri folder of documents containing chunks of data.	uri_folder
llm_config	JSON Configuration for what model to use for question generation. Must contain following keys: 'type' (value must be 'azure_open_ai' or 'azure'), 'model_name' (name of model to use for summary), 'deployment_name' (name of deployment for model), 'temperature' (randomness in response, float from 0 to 1), 'max_tokens' (number of tokens for response).	string	{"type": "azure_open_ai", "model_name": "gpt-35-turbo", "deployment_name": "gpt-35-turbo", "temperature": 0, "max_tokens": 2000}
llm_connection	Workspace connection resource ID for the completion model.	string		False
dataset_size	Number of questions to generate	integer	100
chunk_batch_size	Number of chunks to be read and sent to LLM in parallel	integer	5
output_format	File type to save the dataset as. Options are 'csv' and 'json'	string	json
deployment_validation	Uri file containing information on if the Azure OpenAI deployments, if used, have been validated	uri_file		True

Outputs

Name	Description	Type
output_data	csv or jsonl file containing the question, answer, context, and metadata sets	uri_folder

Environment

azureml:llm-rag-embeddings:76

Wiki menu

Home
Reference Documentation
- Components
- Data
- Environments
- Models
Contributing

components llm_rag_qa_data_generation

LLM - Generate QnA Test Data

llm_rag_qa_data_generation

Overview

Tags

Inputs

Outputs

Environment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!