A full-stack application that uses Retrieval-Augmented Generation (RAG) to analyze GitHub repositories and answer questions about their code.
- Index any public GitHub repository
- Process and chunk code files respecting function/class boundaries
- Generate embeddings for code chunks and store them in a vector database
- Ask natural language questions about the repository
- Get AI-generated answers based on the relevant code contexts
- Frontend: React, TypeScript, Tailwind CSS, React Query
- Backend: Python, FastAPI
- Vector Database: Qdrant (in-memory for development)
- Embedding Model: OpenAI Text Embedding API
- LLM: OpenAI GPT-3.5 Turbo
- Node.js and npm
- Python 3.9+
- OpenAI API key
- Clone this repository
- Install frontend dependencies:
npm install
- Install backend dependencies:
cd backend pip install -r requirements.txt
- Create a
.env
file in the backend directory:OPENAI_API_KEY=your-openai-api-key
-
Start the backend server:
cd backend uvicorn main:app --reload
-
In another terminal, start the frontend:
npm run dev
-
Open your browser to
http://localhost:5173
- Enter a GitHub repository URL in the input field
- Click "Process Repository" to start indexing
- Wait for the indexing to complete (this may take some time for large repositories)
- Ask questions about the repository in the chat interface
- View AI-generated answers with references to specific code files
- The in-memory vector database does not persist data between server restarts
- Large repositories may take a significant amount of time to process
- The chunking algorithm may not perfectly respect code boundaries in all languages
- The quality of answers depends on the OpenAI model used
- Persistent vector database storage
- Support for private GitHub repositories
- More sophisticated code parsing and chunking
- Multi-user support with authentication
- Caching of previously processed repositories
MIT