Skip to content

Commit 4b4fb8a

Browse files
author
Eleni Straitouri
committed
initial commit
0 parents  commit 4b4fb8a

22 files changed

+852363
-0
lines changed

README.md

+79
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
# Evaluation of Large Language Models via Coupled Token Generation
2+
3+
This repository contains the code and data for the paper "Evaluation of Large Language Models via Coupled Token Generation"
4+
by Nina Corvelo Benz, Stratis Tsirtsis, Eleni Straitouri, Ivi Chatzi, Ander Artola Velasco,
5+
Suhas Thejaswi, and Manuel Gomez-Rodriguez.
6+
7+
## Dependencies
8+
9+
All the experiments were performed using Python 3.11. In order to create a virtual environment and install the project dependencies you can run the following commands:
10+
11+
```bash
12+
python3 -m venv env
13+
source env/bin/activate
14+
pip install -r requirements.txt
15+
```
16+
17+
## Code organization
18+
19+
The directory [data](data/) contains the data used for the experiments.
20+
21+
The directory [models](models/) contains the list of models used.
22+
23+
The directory [src](src/) contains the source code for the experiments.
24+
25+
The directory [scripts](scripts/) contains bash scripts that use code under [src](src/) to run the experiments.
26+
27+
The directory [notebooks](notebooks/) contains jupyter notebooks producing the figures appearing in the paper.
28+
29+
The directory [figures](figures/) is used for saving the figures produced by the notebooks.
30+
31+
The directory [outputs](outputs/) is used for saving the outputs produced by the scripts.
32+
33+
## Instructions
34+
35+
### Downloading the models
36+
37+
Our experiments use LLMs from the Llama family.
38+
Llama is a "gated" model, that is, it requires licensing to use.
39+
You can request to access it at: [https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct).
40+
Once you have access, you can download any model in the Llama family.
41+
Then, before running the scripts you need to authenticate with your Hugging Face account by running `huggingface-cli` login in the terminal.
42+
Each model will be downloaded to the [models](models/) folder the first time it is called from a script.
43+
44+
### Setting up
45+
46+
Run `python3 src/merge_tokenizers.py` before running the scripts to set up the joint vocabulary.
47+
48+
49+
### MMLU experiment
50+
The final output files of the experiment are provided in the [outputs/mmlu](outputs/mmlu/) directory.
51+
To reproduce the figures in the paper, you only need to run the [mmlu.ipynb](notebooks/mmlu.ipynb) notebook.
52+
53+
The script [mmlu.sh](scripts/mmlu.sh) produces the outputs of one LLM, using one seed, given the questions from the MMLU dataset as input prompts.
54+
To reproduce all the outputs, run the script twice for each model (for independent and coupled generation), using the seeds provided in the script.
55+
56+
57+
### LMSYS experiment
58+
59+
The final output files of the experiment are provided in the [outputs/LMSYS](outputs/LMSYS/) directory.
60+
To reproduce the figures in the paper, you only need to run the [lmsys.ipynb](notebooks/lmsys.ipynb) notebook.
61+
62+
The script [lmsys.sh](scripts/lmsys.sh) produces the outputs of one LLM, using one seed, to the questions from the dataset in [data/processed/LMSYS/questions.json](data/processed/LMSYS/questions.json).
63+
To reproduce all the outputs, run the script twice for each model (for independent and coupled generation), using the seeds provided in the script.
64+
The results of the pairwise comparisons of these outputs by GPT-4o-2024-11-20 are provided in the [outputs/LMSYS](outputs/LMSYS) directory.
65+
66+
## Contact & attribution
67+
68+
In case you have questions about the code, you identify potential bugs or you would like us to include additional functionalities, feel free to open an issue or contact [Ivi Chatzi](mailto:[email protected]) or [Eleni Straitouri](mailto:[email protected]).
69+
70+
If you use parts of the code in this repository for your own research, please consider citing:
71+
72+
```
73+
@article{benz2025evaluation,
74+
title={Evaluation of Large Language Models via Coupled Token Generation},
75+
author={Nina Corvelo Benz and Stratis Tsirtsis and Eleni Straitouri and Ivi Chatzi and Ander Artola Velasco and Suhas Thejaswi and Manuel Gomez-Rodriguez},
76+
year={2025},
77+
journal={arXiv preprint arXiv:coming_soon}
78+
}
79+
```

data/processed/LMSYS/questions.json

+1,006
Large diffs are not rendered by default.

models/models.json

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
{
2+
"models" : [
3+
"meta-llama/Llama-3.1-8B-Instruct",
4+
"meta-llama/Llama-3.2-3B-Instruct",
5+
"meta-llama/Llama-3.2-1B-Instruct",
6+
"hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4"
7+
]
8+
}

0 commit comments

Comments
 (0)