This is the repository for our EACL 2026 paper:
Sondre Wold*, Étienne Simon*, Erik Velldal, Lilja Øvrelid. 2026. Measuring Idiomaticity in Text Embedding Models with 𝜀-compositionality
The data is included as a git submodule, from there the python module epsilon_compositionality runs all the necessary code and outputs a results directory containing the LaTeX code we included in the article. To reproduce everything, run the following:
git clone --recurse-submodules https://github.com/ltgoslo/epsilon-compositionality
python -m epsilon_compositionalityFor a more decomposed (pun intended) approach, this is equivalent to:
python -m epsilon_compositionality.build_dataset
python -m epsilon_compositionality.extract_similarities
python -m epsilon_compositionality.compute_statistics