GitHub - Edinburgh-AgenticAI/RAGBoost: Boosting RAG on model and system performance with context reuse

Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse

| Documentation | Examples | Benchmarks |

News

[2025/12] Code is released!
[2025/11] Paper published: RAGBoost: Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse

About

RAGBoost is a fast optimization system for Retrieval-Augmented Generation workloads:

High Throughput: Boosting prefill throughput with intelligent context reuse.
Accuracy Preserved: Reasoning accuracy is fully preserved and even enhanced!
Strong Compatibility: Strong compatibility with existing RAG libraries (HippoRAG), KV cache optimization engine (LMCache), and Inference engines (vLLM and SGLang). Both single-node and multi-node deployment!
Widely Tested: Tested with a wide range of RAG and Agentic AI applications.

Benchmark and Performance

System Performance

Tested on Qwen3-4B-Instruct-2507 with 1xH100

Accuracy on MT-RAG Benchmark

Method	Qwen3-4B	Llama3.1-8B	Qwen3-30B-A3B
LMCache	62.56	68.46	75.12
CacheBlend	50.33	56.52	X
RadixCache	62.56	68.46	75.12
RAGBoost	64.27	68.12	75.81

RAGBoost delivers 4-13x improvements in cache hit rates and 1.5-3.5x reductions in prefill latency for large-batch RAG workloads, while maintaining or improving accuracy.

Furthermore, RAGBoost has been tested to reduce input token costs by around 36% with GPT-5.2.

See Benchmarks in the documentation for GPU vs CPU performance analysis and detailed benchmark methodology.

Getting Started

Installation

Requirements: Python >= 3.10

git clone https://github.com/SecretSettler/RAGBoost.git
cd RAGBoost
pip install -e .

Install an inference engine (SGLang recommended):

pip install --upgrade pip
pip install uv
uv pip install "sglang" --prerelease=allow

More detailed installation instructions are available in the docs, including Docker setup and FAISS configuration.

Documentation

Check out the RAGBoost documentation for comprehensive guides.

Examples

Go hands-on with our examples, demonstrating how to address different use cases with RAGBoost.

Contributing

We welcome and value all contributions! Please feel free to submit issues and pull requests.

Contact

Citation

If you use the code or data of RAGBoost, please declare the reference with the following:

@misc{jiang2025ragboost,
      title={RAGBoost: Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse}, 
      author={Yinsicheng Jiang and Yeqi Huang and Liang Cheng and Cheng Deng and Xuan Sun and Luo Mai},
      year={2025},
      eprint={2511.03475},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2511.03475}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
assets		assets
docs		docs
examples		examples
patches/sglang		patches/sglang
ragboost		ragboost
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

News

About

Benchmark and Performance

System Performance

Accuracy on MT-RAG Benchmark

Getting Started

Installation

Documentation

Examples

Contributing

Contact

Citation

About

Uh oh!

Releases

Packages

Languages

License

Edinburgh-AgenticAI/RAGBoost

Folders and files

Latest commit

History

Repository files navigation

News

About

Benchmark and Performance

System Performance

Accuracy on MT-RAG Benchmark

Getting Started

Installation

Documentation

Examples

Contributing

Contact

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages