RAG-API Application Skeleton: A starter kit for building production RAG applications

Looking to build and deploy a real-world RAG application? Welcome!

This repo provides a base skeleton for building a (RAG)-enhanced LLM application. It is intended to serve as a starting point for developers looking to build real-world production RAG applications.

This allows easy setup of a web application that allows you to input large amounts of custom data for use with your LLM, operated via a web API.

What's Included

The project allows you to:

Locally deploy (via Docker) a vector database.
Locally deploy an API to add/remove custom textual data from the vector DB.
Instructions on how to deploy to AWS.
An API to send chat messages to the LLM, which can be used to ask questions about the custom data. It can "look up" data from the vector DB for use as part of the response.
Chat history is stored in a local Postgres, and is also accesable via the API.
The API supports streaming of the response from the LLM.

This repo is built by Hipposys Ltd., and serves as a starting off point for new RAG projects for our clients. It is open sourced both for educational purposes, and to serve as a base for commercial projects.

The current skeleton supports Amazon Bedrock or OpenAI as the LLM provider, and Milvus or Chroma as the vector database. Additional LLM providers and vector databases are planned to be added in the near future.

Features

Web Server providing endpoints for:
- Adding and removing custom data.
- Sending and receiving chat messages.
  - This supports streaming of the response from the LLM.
Using Amazon Bedrock or OpenAI as an LLM provider.
Milvus and Chroma Vector database integrations.
Built on top of Langchain.
Chat history is stored in a local Postgres, and is also accesable via the API.

Prerequisites

Currently, you must have an Amazon Bedrock or OpenAI account to use this project.

You'll also need Docker for the local deploy.

Contact

Looking for help or have questions? Contact us at [email protected].

We work with clients on a variety of AI engineering and Data Enngineering projects.

Installation

The local deployment relies on having Docker installed.

It also relies on having access to Amazon Bedrock or OpenAI models, which are used as the LLM provider of the application.

Local Deployment

Clone the repository:

git clone https://github.com/hipposys-ltd/rag-app-skeleton.git

Navigate to the project directory:
```
cd rag-app-skeleton
```
Make sure you have Docker installed and running.
Create an .env file:
1. cp .env-template .env
2. Fill it in with the necessary credentials and settings.
3. For the initial local deplyment, the most important credentials are the ones defining your LLM provider:
  1. AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY if you'll be using AWS.
  2. OPENAI_API_KEY if you'll be using OpenAI.
  3. Other credentials may be suitable for local development, but should be replaced when deploying to a remote server (e.g. prod) for additional security.
4. If you want to use a local LLM or embedding model, change LLM_MODEL_ID and EMBEDDING_MODEL, then specify a model available at Ollama's model search with ollama: prefix (e.g., LLM_MODEL_ID='ollama:llama3.2:1b'; EMBEDDING_MODEL='ollama:mxbai-embed-large', ). Note: The selected LLM must support tools for compatibility with this architecture. If you modify the default values, remember to update the LLM and EMBEDDING_MODEL arguments in docker-compose.local-models.yml.
Build and run the project via Docker: The base command is docker compose -f docker-compose.yml up -d --build. Add one or more of the following compose files, depending on your use-case:
1. Milvus: -f docker-compose.milvus.yml
2. Chroma: -f docker-compose.chromadb.yml
3. To use local models: -f docker-compose.local-models.yml
4. To enable the UI for chat and Milvus DB,: -f docker-compose.ui.yml
After running docker, you should have multiple services running.
1. You can check the status of the services with docker ps -a.
2. Make sure the fastapi, postgres and milvus-standalone containers are running.
Go to localhost:8080/hello-world to see an {"hello": "world"} response from the server.
You now have a running instance of the RAG application.

Setting Up Bedrock

Make sure that you have a Bedrock model available in your AWS account:
1. Log into the AWS console.
2. Navigate to the Amazon Bedrock service.
3. In the left navigration pane: Bedrock configurations -> Model access.
4. We currently use Claude 3.5 Sonnet for inference and Titan Text Embeddings V2 for embeddings.
  1. Note that this may change, you can either change it yourself, or see that someone else has changed it in the code.
  2. Note that these models may not be available in all regions, we currently use us-east-1 (N. Virginia).
5. If these models are not enabled, you'll have to ask for access. The access should be granted immediately upon request.
6. You'll need to generate access credentials for your Amazon account for use in the application.

Setting Up OpenAI

In your .env file:
1. Add your OPENAI_API_KEY.
2. Set the LLM_MODEL_ID to an OpenAI-compatible model with openai: prefix (e.g., openai:gpt-3.5-turbo).
3. Comment out any unused environment variables (e.g., AWS-related variables).
In app/models/__init__.py, update the code to use OpenAI models instead of the default Bedrock models. Make sure to adjust both the model for inference and the one for embeddings.
Finally, restart Docker Compose to apply the .env changes.

Example Usage

We're now going to give a simple example of how to use the API. The plan is to:

Query our LLM, via the API, and ask for specific "inside" information, which it does not have access to.
Use the API to add the information to the vector database via simple textual data.
Query the LLM again and ask for the same information, which it now has access to.

Note that by default, the repo is configured to return simple textual responses when running in local mode (controlled via .env) and JSON-formatted responses when running in non local modes (e.g. prod).

1. Query the LLM via the API

The following command sends a chat query via the API.:

curl \
    -i \
    -X POST \
    --no-buffer \
    -b cookies.tmp.txt -c cookies.tmp.txt \
    -H 'Content-Type: application/json' \
    -d '{"message": "What headphones are recommended by the company for listening to podcasts?"}' \
    http://localhost:8080/chat/ask

Highlighting the important things in the command:

We are hitting the /chat/ask endpoint to actually ask the LLM a question.
We are using -b and -c to save the cookies from the server. This lets the server continue our chat session, so additional requests to /chat/ask will be part of the same chat session.
The message itself is the chat message we are sending to the LLM.

The output should be a message of not finding anything in the company's internal documents about a headphone recommendation. There will also likely be a general message trying to help.

2. Add information to the vector database

We'll add two sources of information to the vector database about Heapdhone choices:

curl \
    -i \
    -X POST \
    --no-buffer \
    -b cookies.tmp.uc.txt -c cookies.tmp.uc.txt \
    -H 'Content-Type: application/json' \
    -d '{"source_id": "1001", "source_name": "Headphones Guide I", "text": "The recommended headphones to use while listening to podcasts are AirPods Pro", "modified_at": "2024-09-22T17:04"}' \
    http://localhost:8080/embeddings/text/store

curl \
    -i \
    -X POST \
    --no-buffer \
    -b cookies.tmp.uc.txt -c cookies.tmp.uc.txt \
    -H 'Content-Type: application/json' \
    -d '{"source_id": "1001", "source_name": "Headphones Guide II", "text": "The recommended headphones to use while listening to music is BoseQC35", "modified_at": "2024-09-22T17:04"}' \
    http://localhost:8080/embeddings/text/store

Here, we are sending data to the /embeddings/text/store endpoint. This endpoint is responsible for storing the text data in the vector database. We store the data itself, as well as metadata about the source of the data - the source name, the source id, and the modification date.

3. Query the LLM again

Now we can query the LLM again and ask for the same information, which it now has access to:

curl \
    -i \
    -X POST \
    --no-buffer \
    -b cookies.tmp.txt -c cookies.tmp.txt \
    -H 'Content-Type: application/json' \
    -d '{"message": "What headphones are recommended by the company?"}' \
    http://localhost:8080/chat/ask

This time, you shold see a response from the LLM that includes the information we added to the vector database.

Other API Endpoints

You can delete a source using the /embeddings/text/delete endpoint:

curl \
    -i \
    -X DELETE \
    --no-buffer \
    -b cookies.tmp.txt -c cookies.tmp.txt \
    -H 'Content-Type: application/json' \
    -d '{"source_id": "1001"}' \
    http://localhost:8080/embeddings/text/delete

Other Configuration

Using a Different Vector DB

Currently, the project supports Chroma and Milvus, with plans to add more vector databases in the future. By default, Milvus is used, but switching to another supported database is simple:

Open app/databases/vector/__init__.py and update the VectorDB assignment. For example, to switch to Chroma:

from app.databases.vector.chroma import Chroma
# from app.databases.vector.milvus import Milvus

VectorDB = Chroma

When running docker compose, use the docker-compose configuration file that matches the database you’ve chosen. For example, to use Chroma:
```
docker compose \
    -f docker-compose.yml \
    -f docker-compose.chromadb.yml \
    up -d --build
```

Testing

To run the tests, use the following command:

docker exec -it fastapi bash -c "pytest app/"

For faster test execution, at the expense of cleaner output, you can add the -n option to parallelize tests across multiple workers:

docker exec -it fastapi bash -c "pytest -n 5 app/"

In this example, 5 parallel workers will execute the tests.

Deploying to Production

A more complete guide to deploying to production will be added later.

For now, you can check the notes in prod/README.md and the other files in that directory.

Running Jupyter Notebook

When starting the project locally (following the instructions above), a Jupyter Lab server will automatically start. The server configuration is defined in docker-compose.yml.

To access Jupyter Lab, open http://localhost:8890 in your web browser. On your first visit, you’ll need to provide a login token. You can retrieve the token from the logs of the jupyter Docker container by running:

docker logs jupyter 2>&1 | grep token= | tail -n 1 | grep -E '=.+$'

After logging in, navigate to /work/notebooks to access the existing notebooks or create new ones.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
app		app
data		data
milvus		milvus
notebooks		notebooks
prod		prod
streamlit		streamlit
.dockerignore		.dockerignore
.env-template		.env-template
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.pdbrc.py		.pdbrc.py
Dockerfile.jupyter		Dockerfile.jupyter
Dockerfile.ollama		Dockerfile.ollama
Dockerfile.server		Dockerfile.server
LICENSE		LICENSE
README.md		README.md
docker-compose.chromadb.yml		docker-compose.chromadb.yml
docker-compose.local-models.yml		docker-compose.local-models.yml
docker-compose.milvus.yml		docker-compose.milvus.yml
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.ui.yml		docker-compose.ui.yml
docker-compose.yml		docker-compose.yml
pytest.ini		pytest.ini
requirements-base.txt		requirements-base.txt
requirements-dev.txt		requirements-dev.txt
requirements-jupyter.txt		requirements-jupyter.txt
requirements-server.txt		requirements-server.txt
run-ollama.sh		run-ollama.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG-API Application Skeleton: A starter kit for building production RAG applications

What's Included

Features

Prerequisites

Contact

Installation

Local Deployment

Setting Up Bedrock

Setting Up OpenAI

Example Usage

1. Query the LLM via the API

2. Add information to the vector database

3. Query the LLM again

Other API Endpoints

Other Configuration

Using a Different Vector DB

Testing

Deploying to Production

Running Jupyter Notebook

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG-API Application Skeleton: A starter kit for building production RAG applications

What's Included

Features

Prerequisites

Contact

Installation

Local Deployment

Setting Up Bedrock

Setting Up OpenAI

Example Usage

1. Query the LLM via the API

2. Add information to the vector database

3. Query the LLM again

Other API Endpoints

Other Configuration

Using a Different Vector DB

Testing

Deploying to Production

Running Jupyter Notebook

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages