Skip to content

components Retrieval Augmented Generation documentation

github-actions[bot] edited this page Jan 10, 2024 · 12 revisions

Retrieval Augmented Generation

Components in this category


LLM models have token limits for the prompts passed to them, this is a limiting factor at embedding time and even more limiting at prompt completion time as only so much context ca...

LLM models have token limits for the prompts passed to them, this is a limiting factor at embedding time and even more limiting at prompt completion time as only so much context ca...

LLM models have token limits for the prompts passed to them, this is a limiting factor at embedding time and even more limiting at prompt completion time as only so much contex...

  • llm_rag_crawl_url

    Crawls the given URL and nested links to max_crawl_depth. Data is stored to output_path.

  • llm_rag_create_faiss_index

    Creates a FAISS index from embeddings. The index will be saved to the output folder. The index will be registered as a Data Asset named asset_name if register_output is set to True.

  • llm_rag_create_promptflow

    This component is used to create a RAG flow based on your mlindex data and best prompts. The flow will look into your indexed data and give answers based on your own data context. The flow also provides the capability to bulk test with any built-in or custom evaluation flows.

  • llm_rag_data_import_acs

    Collects documents from Azure Cognitive Search Index, extracts their contents, saves them to a uri folder, and creates an MLIndex yaml file to represent the search index.

Documents collected can then be used in other components without having to query the ACS index again, allowing for a consiste...

chunks_source is expected to contain csv files containing two columns:

  • "Chunk" - Chunk of text to be embedded
  • "Metadata" - JSON object containing metadata for the chunk

If embeddings_container is supplied, input c...

chunks_source is expected to contain csv files containing two columns:

  • "Chunk" - Chunk of text to be embedded
  • "Metadata" - JSON object containing metadata for the chunk

If previous_embeddings is supplied, input ch...

  • llm_rag_git_clone

    Clones a git repository to output_data path

  • llm_rag_image_embed_index

    Embeds input images and stores it in Azure Cognitive Search index with metadata using Florence embedding resource. MLIndex is stored to output_path.

  • llm_rag_qa_data_generation

    Generates a test dataset of questions and answers based on the input documents.

A chunk of text is read from each input document and sent to the specified LLM with a prompt to create a question and answer based on that text. These question, answer, and context sets are saved as either a csv or j...

The Index will have the following fields populated:

  • "id", String, key=True

  • "content", String

  • "contentVector", Collection(Single)

  • "category", String

  • "url",...

  • llm_rag_update_cosmos_mongo_vcore_index

    Uploads embeddings into Azure Cosmos Mongo vCore collection/index specified in azure_cosmos_mongo_vcore_config. The collection/index will be created if it doesn't exist.

The collection/index will have the following fields populated:

  • "_id", String, key=True

  • "content", String

  • "contentVec...

  • llm_rag_update_milvus_index

    Uploads embeddings into Milvus collection/index specified in milvus_config. The collection/index will be created if it doesn't exist.

The collection/index will have the following fields populated:

  • "id", String, key=True

  • "content", String

  • "contentVector", Collection(Single)

  • "url", Str...

  • llm_rag_update_pinecone_index

    Uploads embeddings into Pinecone index specified in pinecone_config. The Index will be created if it doesn't exist.

Each record in the Index will have the following metadata populated:

  • "id", String

  • "content", String

  • "url", String

  • "filepath", String

  • "title", String

  • "metadata_json_...

  • llm_rag_validate_deployments

    Validates that completion model, embedding model, and Azure Cognitive Search resource deployments is successful and connections works. For default AOAI, it attempts to create the deployments if not valid or present. This validation is done only if customer is using Azure Open AI models or creatin...

Clone this wiki locally