Skip to content

This repo contains the code for demonstrating how to using LlamaEdge RAG to build a RAG app

Notifications You must be signed in to change notification settings

LlamaEdge/Example-LlamaEdge-RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Example-LlamaEdge-RAG repo is replaced by the LlamaEdge/RAG-API-Server repo.

If you want to build a RAG application with one single binary, go to check out


ChatBot on LlamaEdge RAG

This repository demonstrates a RAG chatbot powered by LlamaEdge RAG. The chatbot uses

  • The /v1/embedding endpoint of the llama-api-server to (1) compute the embeddings for the given document and (2) persist the embeddings in the specified Qdrant DB.

  • The /v1/chat/completions endpoint of the llama-api-server to (1) compute the embeddings for the user question; (2) query the Qdrant DB and retrieve the most similar documents to the user's input; (3) generate the completions for the user's input.

Note that it is required to specify --qdrant-url, --qdrant-collection-name, --qdrant-limit CLI options while starting the llama-api-server to enable the RAG service.

The entire workflow is depicted in the diagrams in the Workflow section.

If you'd like to use curl to interact with LlamaEdge RAG endpoints, you can refer to curl.md.

Setup

  • Install WasmEdge Runtime

    curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- -v 0.13.5  --plugins wasi_nn-ggml wasmedge_rustls
  • Start Qdrant docker container

    # Pull the Qdrant docker image
    docker pull qdrant/qdrant
    
    # Create a directory to store Qdrant data
    mkdir qdrant_storage
    
    # Run Qdrant service
    docker run -p 6333:6333 -p 6334:6334 -v $(pwd)/qdrant_storage:/qdrant/storage:z qdrant/qdrant
  • Start llama-api-server

    wasmedge --dir .:. --nn-preload default:GGML:AUTO:Llama-2-7b-chat-hf-Q5_K_M.gguf llama-api-server.wasm --prompt-template llama-2-chat --ctx-size 4096 --qdrant-url http://127.0.0.1:6333 --qdrant-collection-name "paris" --qdrant-limit 3

Usage

  • Build and run

    # clone the repository
    git clone https://github.com/LlamaEdge/Example-LlamaEdge-RAG.git
    cd Example-LlamaEdge-RAG
    
    # build the executable
    cargo build --release
    
    # run the executable
    ./target/release/llama-rag --file paris.txt

    If the command runs successfully, you will see the following output:

    [INFO] Document: paris.txt
    
    [+] Chunking the document ...
    [+] Computing the embeddings for the document ...
    
    [You]:
    What is the sobriquet of Paris, France?
    
    [Bot]:
    Based on the provided context, the sobriquet of Paris, France is "the City of Light" (French: "la Ville Lumière").
    
    
    [You]:
    What is the location of Paris, France on the Seine River?
    
    [Bot]:
    According to the provided text, Paris is situated along the Seine River some 233 miles (375 km) upstream from the river’s mouth on the English Channel (La Manche). Therefore, the location of Paris, France on the Seine River is approximately 233 miles (375 km) upstream from the river's mouth.

Workflow

  • Compute and store embeddings for a given text file

  • RAG-based chatbot

About

This repo contains the code for demonstrating how to using LlamaEdge RAG to build a RAG app

Resources

Stars

Watchers

Forks

Packages

No packages published