Skip to content

components llm_ingest_dbcopilot_faiss_e2e

github-actions[bot] edited this page Nov 7, 2024 · 69 revisions

Data Ingestion for DB Data Output to FAISS E2E Deployment

llm_ingest_dbcopilot_faiss_e2e

Overview

Single job pipeline to chunk data from AzureML DB Datastore and create faiss embeddings index

Version: 0.0.66

View in Studio: https://ml.azure.com/registries/azureml/components/llm_ingest_dbcopilot_faiss_e2e/version/0.0.66

Inputs

Name Description Type Default Optional Enum
db_datastore database datastore uri in the format of 'azureml://datastores/{datastore_name}' string
sample_data Sample data to be used for data ingestion. format: 'azureml:samples-test:1' uri_folder True

path: "azureml:samples-test:1" data ingest setting

Name Description Type Default Optional Enum
embeddings_model The model used to generate embeddings. 'azure_open_ai://endpoint/{endpoint_name}/deployment/{deployment_name}/model/{model_name}' string
chat_aoai_deployment_name The name of the chat AOAI deployment string True
embedding_aoai_deployment_name The name of the embedding AOAI deployment string

grounding settings

Name Description Type Default Optional Enum
max_tables integer True
max_columns integer True
max_rows integer True
max_sampling_rows integer True
max_text_length integer True
max_knowledge_pieces integer True
selected_tables The list of tables to be ingested. If not specified, all tables will be ingested. Format: ["table1","table2","table3"] string True
column_settings string True

copilot settings

Name Description Type Default Optional Enum
tools The name of the tools for dbcopilot. Supported tools: "tsql", "python". Format: ["tsql", "python"] string True

deploy settings

Name Description Type Default Optional Enum
endpoint_name The name of the endpoint string
deployment_name The name of the deployment string blue
mir_environment The name of the mir environment. Format: azureml://registries/{registry_name}/environments/llm-dbcopilot-mir string

compute settings

Name Description Type Default Optional Enum
serverless_instance_count integer 1 True
serverless_instance_type string Standard_DS3_v2 True
embedding_connection Azure OpenAI workspace connection ARM ID for embeddings string True
llm_connection Azure OpenAI workspace connection ARM ID for llm string True
temperature number 0.0 True
top_p number 0.0 True
include_builtin_examples boolean True True
knowledge_pieces The list of knowledge pieces to be used for grounding. string True
include_views Whether to turn on views. boolean True
instruct_template The instruct template for the LLM. string True
managed_identity_enabled Whether to connect using managed identity. boolean False True
egress_public_network_access This option allows the resource to send outbound traffic to the public Internet or not, there are two choices disabled and enabled, the default is enabled string enabled True

Outputs

Name Description Type
grounding_index uri_folder
db_context uri_folder
Clone this wiki locally