Official code implementation for the paper "Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?" accepted at ICLR 2025! 🙌 https://openreview.net/forum?id=lCasyP21Bf
This is follow-up work building on the paper "On Measuring Faithfulness of Natural Language Explanations" https://aclanthology.org/2024.acl-long.329/ that developed CC-SHAP and applied it to LLMs 📃. Now, we extend to VLMs 🖼️+📃.
@inproceedings{parcalabescu2025do,
title={Do Vision \& Language Decoders use Images and Text equally? How Self-consistent are their Explanations?},
author={Letitia Parcalabescu and Anette Frank},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=lCasyP21Bf}
}@inproceedings{parcalabescu-frank-2024-measuring,
title = "On Measuring Faithfulness or Self-consistency of Natural Language Explanations",
author = "Parcalabescu, Letitia and
Frank, Anette",
editor = "Ku, Lun-Wei and
Martins, Andre and
Srikumar, Vivek",
booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = aug,
year = "2024",
address = "Bangkok, Thailand",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.acl-long.329/",
doi = "10.18653/v1/2024.acl-long.329",
pages = "6048--6089",
}- mPLUG-Owl3 (second paper version) can be run from the
mainbranch with the instructions in the next section. - BakLLaVA (first paper version) is experimental on
main, because the hftransformersversion had to be upgraded for supporting mPLUG-Owl3 -- upgrade which brought significant functionality changes. To run the model as it was run for the paper, check out the branchvicuna-mistral-bakllava-modelsand follow the instructions from theREADME.mdfile there. - LLaVA-NeXT-Vicuna (first paper version) same as above.
- LLaVA-NeXT-Mistral (first paper version) same as above.
conda create -n <env-name> python=3.12.1pip install -r requirements_pip-mplug-owl3.txtfor installing the required packages with pip for running mPLUG-Owl3 experiments.- Download the data from their respective repositories and change the paths in
config.pyaccordingly. Data repositories:
- VALSE 💃: https://github.com/Heidelberg-NLP/VALSE
- VQA: https://visualqa.org/download.html
- GQA: https://cs.stanford.edu/people/dorarad/gqa/download.html
- To run mPLUG-Owl3, you need to make sure the
_decodefunction in$HF_HOME/modules/transformers_modules/mPLUG/mPLUG-Owl3-7B-240728/eff25bcdc02ff1b513c25f376d761ec1ab6dfa1b/modeling_mplugowl3.pyreturns the output ids and not just the text, so move the lineoutput = output[:,input_ids.shape[1]:]to the if statement such that the last lines of that function look like:
if decode_text:
output = output[:,input_ids.shape[1]:]
return self._decode_text(output, tokenizer)
return output- Run
run-faithfulness.pywith the following commandpython run-faithfulness.py foil_it mplug-owl3-7b 100 0 data/
To activate tests individually, comment the respective elements of TESTS in config.py (cc_shap-posthoc and cc_shap-cot must be run together). All tests are implemented for the first three models in the other branch. The mPLUG-Owl3 model is only supported for the cc_shap-posthoc and cc_shap-cot tests.
The Shapley value implementation in the shap folder is a modified version of https://github.com/slundberg/shap .
This is work in progress. Code and paper will be revised and improved for conference submissions.