This repository contains a RAG (Retrieval-Augmented Generation) system for code-related tasks, specifically focused on code completion and bug localization.
The RAG system enhances language models by retrieving relevant context from a codebase before generating completions or localizing bugs. This approach improves the quality and relevance of model outputs by providing task-specific context.
In this project, we benchmark different approaches to RAG for code to recommend the best approach for various scenarios. We evaluate different chunking strategies, scoring methods, and context composition techniques to determine the most effective combinations.
The project consists of two main components:
- Code Completion: Enhances code completion by retrieving relevant context from the codebase
- Bug Localization: Identifies files likely to contain bugs based on issue descriptions
The code completion component uses RAG to improve code suggestions by:
- Chunking repository files into manageable pieces
- Scoring chunks based on relevance to the current coding context
- Using the most relevant chunks as context for code completion
Configuration options:
- Different chunking strategies (
full_file
,fixed_line
,langchain
) - Various scoring methods (
BM25
,IOU
,dense
embeddings) - Adjustable context sizes and composition strategies
Performance is evaluated using Exact Match (EM).
The bug localization component helps identify files likely to contain bugs by:
- Taking issue descriptions as input
- Chunking repository files
- Scoring chunks based on relevance to the issue description
- Aggregating scores at the file level to identify the most likely locations of bugs
Performance is evaluated using metrics like F1 score and NDCG.
The chunking pipeline is a critical component of our RAG system, responsible for breaking down repository files into manageable pieces that can be efficiently processed and retrieved.
- Python 3.9+
- Poetry (for dependency management)
-
Clone the repository:
git clone https://github.com/JetBrains-Research/project-adaptation-experiments.git cd project-adaptation-experiments
-
Install dependencies using Poetry:
poetry install
To run code completion experiments:
-
Configure the experiment in
rag/configs/plcc.yaml
:- Set the model, language, and context composer
- Configure context sizes and completion categories
- Specify output paths
-
Run the evaluation:
python -m rag.eval_plcc
To run bug localization experiments:
-
Configure the experiment in
rag/configs/bug_localization.yaml
andrag/configs/rag.yaml
:- Set the chunker, scorer, and other parameters
- Specify output paths
-
Run the evaluation:
python -m rag.bug_localization
The system can be configured through YAML files in the rag/configs
directory:
rag.yaml
: General RAG configuration (chunkers, scorers, models)bug_localization.yaml
: Bug localization specific settingsplcc.yaml
: Code completion specific settings
Our benchmarking experiments have yielded several important insights:
- Larger the context of the generation model – larger chunks you should use. The minimal size of the context chunk is 32 lines.
IoU
+line_splitter
is good on short contexts (<=2000) and very fast.- The
word_splitter
and tokenizer has equal performance; theword_splitter
is much faster. - No need to include non-code files in this task. Saves plenty compute.
This plot shows the comparison of the explored context composer strategies:
- No Context (baseline)
- Path Distance (baseline),
- optimal
full_file
composer configuration - optimal
fixed_line
composer configuration
Parameter | Values |
---|---|
Chunkers | full_file , fixed_line |
Scorer | bm25 |
Splitter | word_splitter |
File extensions | [py ] |
Context chunk size | 32 lines (for fixed_line ) |
Completion chunk size | 32 lines (for fixed_line ) |
To plot the results of the experiments we used rag/plot_analysis/all_py_kt_plots.ipynb
rag/
: Main project directorybug_localization/
: Bug localization componentsconfigs/
: Configuration filescontext_composers/
: Context composition strategiesdraco/
: Data flow analysis componentsmetrics/
: Evaluation metricsplot_analysis/
: Visualization toolsrag_engine/
: Core RAG functionality (chunkers, scorers, splitters)utils/
: Utility functions
Legacy code and experiments have been moved to the archive/
directory.