Skip to content

gankoji/rag_agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAG Application with ChromaDB Integration

A Node.js application that implements Retrieval-Augmented Generation (RAG) using ChromaDB as the vector database and Anthropic's Claude 3 as the foundation model. The application can scrape GitHub repositories and use their contents as context for answering questions.

Features

  • GitHub Repository Scraping: Automatically clone and process repositories (supports both public and private repositories via SSH)
  • Vector Storage: Uses ChromaDB to store and query document embeddings
  • RAG-powered Responses: Combines retrieved context with user queries to generate more informed responses
  • REST API Interface: Simple HTTP endpoints for both scraping and querying

Prerequisites

  • Docker and Docker Compose
  • An Anthropic API key
  • (Optional) SSH key for accessing private GitHub repositories

Quick Start

  1. Clone this repository
  2. Create a .env file in the root directory:
ANTHROPIC_API_KEY=your-api-key-here
  1. (Optional) For private repository access, add your SSH keys:
cp ~/.ssh/id_rsa ./id_rsa
cp ~/.ssh/id_rsa.pub ./id_rsa.pub
  1. Build and start the services:
docker-compose up --build

API Endpoints

Scrape Repository

POST /scrape
Content-Type: application/json

{
    "url": "https://github.com/username/repository"
}

Query with RAG

POST /prompt
Content-Type: application/json

{
    "messages": [
        {
            "role": "user",
            "content": "Your question here"
        }
    ]
}

Configuration

Environment Variables

  • ANTHROPIC_API_KEY: Your Anthropic API key
  • MODEL_NAME: The Claude model to use (defaults to claude-3-sonnet-20240229)
  • CHROMA_URL: ChromaDB instance URL (defaults to http://chromadb:8000)

Docker Volumes

  • app_data: Temporary storage for git operations
  • chroma_data: Persistent storage for ChromaDB

Architecture

The application consists of several key components:

  • Express Server: Handles HTTP requests and routing
  • ChromaDB: Vector database for storing and querying document embeddings
  • LLM Service: Interfaces with Anthropic's Claude 3 model
  • GitHub Scraper: Clones and processes repository content

Development

  1. Install dependencies:
npm install
  1. Run in development mode:
npm run dev
  1. Build for production:
npm run build

Dependencies

Main Dependencies

  • @anthropic-ai/sdk: Anthropic Claude API client
  • chromadb: Vector database client
  • express: Web framework
  • simple-git: Git operations
  • typescript: Type support
  • zod: Schema validation

Dev Dependencies

  • ts-node-dev: Development server
  • @types/*: TypeScript type definitions

Security Notes

  • Ensure proper SSH key permissions (600) when using private repositories
  • Keep your Anthropic API key secure
  • Consider implementing rate limiting for production use
  • The application currently accepts GitHub host keys automatically in development

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published