components llm_ingest_dbcopilot_faiss_e2e

Data Ingestion for DB Data Output to FAISS E2E Deployment

Single job pipeline to chunk data from AzureML DB Datastore and create faiss embeddings index

Version: 0.0.66

View in Studio: https://ml.azure.com/registries/azureml/components/llm_ingest_dbcopilot_faiss_e2e/version/0.0.66

Name	Description	Type	Default	Optional	Enum
db_datastore	database datastore uri in the format of 'azureml://datastores/{datastore_name}'	string
sample_data	Sample data to be used for data ingestion. format: 'azureml:samples-test:1'	uri_folder		True

path: "azureml:samples-test:1" data ingest setting

Name	Description	Type	Optional
embeddings_model	The model used to generate embeddings. 'azure_open_ai://endpoint/{endpoint_name}/deployment/{deployment_name}/model/{model_name}'	string
chat_aoai_deployment_name	The name of the chat AOAI deployment	string	True
embedding_aoai_deployment_name	The name of the embedding AOAI deployment	string

grounding settings

Name	Description	Type	Optional
max_tables		integer	True
max_columns		integer	True
max_rows		integer	True
max_sampling_rows		integer	True
max_text_length		integer	True
max_knowledge_pieces		integer	True
selected_tables	The list of tables to be ingested. If not specified, all tables will be ingested. Format: ["table1","table2","table3"]	string	True
column_settings		string	True

copilot settings

Name	Description	Type	Default	Optional	Enum
tools	The name of the tools for dbcopilot. Supported tools: "tsql", "python". Format: ["tsql", "python"]	string		True

deploy settings

Name	Description	Type	Default
endpoint_name	The name of the endpoint	string
deployment_name	The name of the deployment	string	blue
mir_environment	The name of the mir environment. Format: azureml://registries/{registry_name}/environments/llm-dbcopilot-mir	string

compute settings

Name	Description	Type	Default	Optional
serverless_instance_count		integer	1	True
serverless_instance_type		string	Standard_DS3_v2	True
embedding_connection	Azure OpenAI workspace connection ARM ID for embeddings	string		True
llm_connection	Azure OpenAI workspace connection ARM ID for llm	string		True
temperature		number	0.0	True
top_p		number	0.0	True
include_builtin_examples		boolean	True	True
knowledge_pieces	The list of knowledge pieces to be used for grounding.	string		True
include_views	Whether to turn on views.	boolean		True
instruct_template	The instruct template for the LLM.	string		True
managed_identity_enabled	Whether to connect using managed identity.	boolean	False	True
egress_public_network_access	This option allows the resource to send outbound traffic to the public Internet or not, there are two choices disabled and enabled, the default is enabled	string	enabled	True

Name	Description	Type
grounding_index		uri_folder
db_context		uri_folder