Some code to process the SIGIR 2025 LiveRAG challenge. It has been tested on M1 Mac and Windows (with CUDA).
python -m venv .venv
source .venv/bin/activate
# system-specific pytorch not part of the requirements.txt
pip install torch==2.7.0
pip install -r requirements.txt
python -m venv .venv
.venv\Scripts\Activate.ps1
# pytorch for CUDA 12.8 according to https://pytorch.org/get-started/locally/
pip install torch==2.7.0 --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt
Copy .env-template
to .env
and replace the respective values.
We use bm25s for BM25 retrieval and Snowflake/arctic-embed-l embeddings in a usearch kNN index for retrieval.
You don't need to re-create these embeddings/indices. We have a prebuilt version for download. The download is about 60GB, so will take some time.
./01_download.sh
./01_download.ps1
The challenge is run in 4 steps:
- Retrieval: BM25 and kNN results of the original question and a Falcon-generated HyDE passage
- Result fusion: RRF of the 4 previously retrieved result sets
- Reranking: Re-ranking the fused results using a reranker model
- Answer generation
Run the script to execute them all sequentially.
./02_run.sh
./02_run.ps1
Note that the reranking step is using unicamp-dl/InRanker-base which is slow when running on a non-CUDA platform.
The resulting file will be liverag_step4.jsonl. There are .parquet files for the intermediate results.