This repository contains the implementation of the arXiv preprint: Protecting Multimodal LLMs against misleading visualizations. The code is released under an Apache 2.0 license.
Contact person: Jonathan Tonglet
Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.
- We released a follow-up work "Is this chart lying to me? Automating the detection of misleading visualizations", check it out 🔥
Visualizations play a pivotal role in daily communication in an increasingly data-driven world. Research on multimodal large language models (MLLMs) for automated chart understanding has accelerated massively, with steady improvements on standard benchmarks. However, for MLLMs to be reliable, they must be robust to misleading visualizations, charts that distort the underlying data, leading readers to draw inaccurate conclusions that may support disinformation. Here, we uncover an important vulnerability: MLLM question-answering accuracy on misleading visualizations drops on average to the level of a random baseline. To address this, we introduce the first inference-time methods to improve performance on misleading visualizations, without compromising accuracy on non-misleading ones. The most effective method extracts the underlying data table and uses a text-only LLM to answer the question based on the table. Our findings expose a critical blind spot in current research and establish benchmark results to guide future efforts in reliable MLLMs.
- Misleading visualizations are charts that distort the underlying data table, leading readers to inaccurate interpretations that may support disinformation 📊
- Distortions include truncated and inverted axes, 3D effects, or inconsistent tick intervals
- Misleading visusalizations negatively affect the performance of human readers in QA tasks. What about MLLMs?
- MLLMs are very vulnerable to misleading visualizations too
⚠️ - their QA performance drops
- to the level of the random baseline
- by up to 65.5 percentage points compared to the standard benchmark ChartQA
- they cannot answer questions consistently depending on whether they observe a misleading or non-misleading visualization of the same data
- their QA performance drops
- We propose six inference-time correction methods to improve performance on misleading visualizations 🛠️
- the best method is to extract the table using the MLLM, then answer with a LLM using the table only
- this improves QA performance on misleading visualizations by up to 19.6 percentage points
- However this degrades the performance on non-misleading visualizations
- An alternative is to redraw the chart based on the extracted table, yielding smaller improvements
Follow these instructions to recreate the environment used for all our experiments.
$ conda create --name misviz python=3.9
$ conda activate misviz
$ pip install -r requirements.txt
-
CALVI
- dataset introduced by Get el. (2023) in "CALVI: Critical Thinking Assessment for Literacy in Visualizations".
- Ready to use
- License: CC-BY 4.0
-
Lauer & O'Brien
- dataset introduced by Lauer & O'Brien (2020) in "The Deceptive Potential of Common Design Tactics Used in Data Visualizations"
- Ready to use
-
Real-world
- dataset introduced in this work, based on visualizations collected by Lo et al. (2022) in "Misinformed by visualization: What do we learn from misinformative visualizations?"
- Images should be downloaded using the script below
- License for the QA pairs: CC-BY-SA 4.0
-
CHARTOM
- dataset introduced by Bharti et al. (2024) in "CHARTOM: A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models"
- Please contact the authors to get access to the dataset.
- Run the script below to process the dataset.
-
VLAT
- dataset introduced by Lee et al. (2017) in "VLAT: Development of a Visualization Literacy Assessment Test"
- Ready to use
The following script will prepare the datasets, including downloading the real-world images.
$ python src/dataset_preparation.py
The following code lets you evaluate the performance of MLLMs on misleading and non-misleading visualizations, with or without one of the six correction methods proposed in the paper. Some correction methods require intermediate steps like extracting the axes or table, or redrawing the visualization.
$ python src/question_answering.py --datasets calvi-chartom-real_world-vlat --model internvl2.5/8B/
The --datasets argument expects a string of dataset names separated by -. By default, available datasets are calvi, chartom, real_world, lauer, and vlat.
The --model argument expects a string in the format model_name/model_size/. By default, the following models are available:
| Name | Available sizes | 🤗 models |
|---|---|---|
| internvl2.5 | 2B, 4B, 8B, 26B, 38B | Link |
| ovis 1.6 | 9B, 27B | Link |
| llava-v1.6-vicuna | 7B, 13B | Link |
| qwen2vl | 2B, 7B | Link |
| chartinstruction | 13B | Link |
| chartgemma | 3B | Link |
| tinychart | 3B | Link |
If you want to use TinyChart: you need to copy this folder and place it in the root folder of this repo.
If you want to use ChartInstruction: you need to copy this folder and place it in the root folder of this repo.
We also provide code to conduct experiments with GPT4, GPT4o, Gemini-1.5-flash, and Gemini-1.5-pro using the Azure OpenAI Service and the Google AI Studio. You will first need to obtain API keys from both providers and store them as environment variables.
$ python src/chart2metadata.py --datasets calvi-chartom-real_world-vlat --model internvl2.5/8B/
$ python src/table2code.py --datasets calvi-chartom-real_world-vlat --model qwen2.5/7B/
Finally, evaluate the accuracy of the models
$ python src/evaluate.py --results_folder results_qa --output_file results_qa.csv
If you find this work relevant to your research or use this code in your work, please cite our paper as follows:
@article{tonglet2025misleadingvisualizations,
title={Protecting multimodal LLMs against misleading visualizations},
author={Tonglet, Jonathan and Tuytelaars, Tinne and Moens, Marie-Francine and Gurevych, Iryna},
journal={arXiv preprint arXiv:2502.20503},
year={2025},
url={https://arxiv.org/abs/2502.20503},
doi={10.48550/arXiv.2502.20503}
}This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.



