A Node.js application that implements Retrieval-Augmented Generation (RAG) using ChromaDB as the vector database and Anthropic's Claude 3 as the foundation model. The application can scrape GitHub repositories and use their contents as context for answering questions.
- GitHub Repository Scraping: Automatically clone and process repositories (supports both public and private repositories via SSH)
- Vector Storage: Uses ChromaDB to store and query document embeddings
- RAG-powered Responses: Combines retrieved context with user queries to generate more informed responses
- REST API Interface: Simple HTTP endpoints for both scraping and querying
- Docker and Docker Compose
- An Anthropic API key
- (Optional) SSH key for accessing private GitHub repositories
- Clone this repository
- Create a
.envfile in the root directory:
ANTHROPIC_API_KEY=your-api-key-here- (Optional) For private repository access, add your SSH keys:
cp ~/.ssh/id_rsa ./id_rsa
cp ~/.ssh/id_rsa.pub ./id_rsa.pub- Build and start the services:
docker-compose up --buildPOST /scrape
Content-Type: application/json
{
"url": "https://github.com/username/repository"
}POST /prompt
Content-Type: application/json
{
"messages": [
{
"role": "user",
"content": "Your question here"
}
]
}ANTHROPIC_API_KEY: Your Anthropic API keyMODEL_NAME: The Claude model to use (defaults to claude-3-sonnet-20240229)CHROMA_URL: ChromaDB instance URL (defaults to http://chromadb:8000)
app_data: Temporary storage for git operationschroma_data: Persistent storage for ChromaDB
The application consists of several key components:
- Express Server: Handles HTTP requests and routing
- ChromaDB: Vector database for storing and querying document embeddings
- LLM Service: Interfaces with Anthropic's Claude 3 model
- GitHub Scraper: Clones and processes repository content
- Install dependencies:
npm install- Run in development mode:
npm run dev- Build for production:
npm run build- @anthropic-ai/sdk: Anthropic Claude API client
- chromadb: Vector database client
- express: Web framework
- simple-git: Git operations
- typescript: Type support
- zod: Schema validation
- ts-node-dev: Development server
- @types/*: TypeScript type definitions
- Ensure proper SSH key permissions (600) when using private repositories
- Keep your Anthropic API key secure
- Consider implementing rate limiting for production use
- The application currently accepts GitHub host keys automatically in development