The official repository of the KDD 2024 paper "Neural Retrievers are Biased Towards LLM-Generated Content". [arXiv]
🌟 New Release!🌟 Check out our latest project, "Cocktail: A Comprehensive Information Retrieval Benchmark with LLM-Generated Documents Integration" at GitHub. This extensive benchmark includes 16 datasets, over ten popular retrieval models, and easy-to-use evaluation tools. Please dive into our repository for more details!
If you find our code or work useful for your research, please cite our work.
@article{dai2024neural,
title={Neural Retrievers are Biased Towards LLM-Generated Content},
author={Dai, Sunhao and Zhou, Yuqi and Pang, Liang and Liu, Weihao and Hu, Xiaolin and Liu, Yong and Zhang, Xiao and Wang, Gang and Xu, Jun},
journal={Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
year={2024}
}
-
For details of datasets, please check file
datasets/README.md
-
For details of evaluating codes, please check the code in the folder
evaluate/
-
For details of dataloader code, please check the file
beir/datasets/data_loader.py
.
├── beir # * evaluating codes from beir
│ ├── datasets # * codes for datalaoder
│ ├── reranking # * codes for reranking model
│ └── retrieval # * codes for lexical and dense retrieval model
├── datasets
│ ├── 0.2 # * corpus generted by LLM with temperature 0.2
│ ├── 1.0 # * corpus generted by LLM with temperature 1.0
│ └── qrels # * relevance for queries
└── evaluate # * codes for evaluating different retrieval model
# test on human corpus
python evaluate/evaluate_contriever.py --test_dataset scifact \
--target human --candidate_lm human
# test on llama-2-7b-chat corpus
python evaluate/evaluate_contriever.py --test_dataset scifact \
--target llama-2-7b-chat --candidate_lm llama-2-7b-chat
# test metric targeting on human-written on mix-corpora
python evaluate/evaluate_contriever.py --test_dataset scifact \
--target human --candidate_lm human llama-2-7b-chat
# test metric targeting on LLM-generated on mix-corpora
python evaluate/evaluate_contriever.py --test_dataset scifact \
--target llama-2-7b-chat --candidate_lm human llama-2-7b-chat
The Cocktail benchmark is built based on BEIR and Sentence Transformers.
This repository has the following dependency requirements.
python==3.10.13
pandas==2.1.4
scikit-learn==1.3.2
evaluate==0.4.1
sentence-transformers==2.2.2
spacy==3.7.2
tiktoken==0.5.2
pytrec-eval==0.5
The required packages can be installed via pip install -r requirements.txt
.