🧠 ConspirED Dataset Overview

Welcome to the repo for ConspirED: A Dataset for Cognitive Traits of Conspiracy Theories and Large Language Model Safety.

ConspirED is a dataset for identifying cognitive traits of conspiracy theories in text. It contains annotated conspiracy snippets labeled with the CONSPIR traits of conspiratorial ideation.

📁 Directory Structure

The repository is organized as follows:

arxiv2025-conspired/
├── data/                              # Dataset files
│   ├── context_training.xlsx          # Training split
│   ├── context_testing.xlsx           # Test split
│   ├── val_splits/                    # Validation splits
│   └── LICENSE-CC-BY-4.0.txt          # CC-BY-4.0 license for datasets
├── .github/                           # GitHub Actions and workflows
├── static/                            # Project page assets
├── conspir_tils.py                    # Utility functions for prompting experiments
├── main.py                            # Main script for LLM prompting experiments
├── train_clf.py                       # Script for fine-tuning LaGoNN classifiers
├── lagonn.py                          # LaGoNN model implementation
├── setup_utils.py                     # Setup and evaluation utilities
├── finetuning_environment.yml         # Conda environment for LaGoNN
├── prompting_environment.yml          # Conda environment for prompting
├── README.md                          # This file
├── LICENSE                            # Apache 2.0 license for code
├── NOTICE.txt                         # Copyright notices
├── .gitignore                         # Git ignore rules
├── .nojekyll                          # GitHub Pages configuration
└── index.html                         # Project landing page

`data/`

Contains all processed dataset files:

Dataset Fields

Each dataset file contains the following columns:

doc_id: Unique identifier for the source document
snippet: The annotated text snippet exhibiting conspiratorial thinking
context500: Surrounding context (500 tokens) around the snippet
context1000: Surrounding context (1000 tokens) around the snippet
labels: Multi-hot encoded list of conspiracy traits (0/1 for each of 6 traits)
consolidated_trait: Human-readable list of trait names present in the snippet
dominant_consol_trait: The most salient/dominant trait in the snippet
single_dominant_one_hot_dm: One-hot encoded dominant trait vector
OverallTrait: Original annotation of overall conspiracy traits
DominantTrait: Original annotation of the dominant trait
Justification: Annotator's justification for trait assignment
Confidence: Annotator confidence score
ConspiracyTheoryorMainstream: Classification of source as conspiracy theory or mainstream
annotated_text: Original annotated text with markup
linkingpassage: Context linking the snippet to broader narrative
begin/end: Character offsets of the snippet in the source document
name: Source annotator where relevant
id: Snippet identifier
Label: Additional label information
remove_row: Flag for data quality filtering

The six conspiracy traits are (in order): Contradictory, Overriding suspicion, Nefarious intent, Persecuted victim, Immune to evidence, and Re-interpreting randomness.

⚙️ Environment Setup

To set up the environment, use the provided Conda YAML files:

finetuning_environment.yml — for training and evaluating LaGoNN.
prompting_environment.yml — for prompting experiments with LLaMA or GPT models.

✅ Create a Conda environment

Use the following commands:

LaGoNN (fine-tuning)

conda env create -f finetuning_environment.yml
conda activate lagonn-env

Prompting

conda env create -f prompting_environment.yml
conda activate prompting-env

🔑 API Configuration

If you plan to use OpenAI models (GPT-4, etc.), you need to set your API key as an environment variable:

export OPENAI_API_KEY='your-api-key-here'

For permanent configuration, add this line to your ~/.bashrc or ~/.zshrc file.

🏋️‍♂️ Fine-Tuning LaGoNN

Use the train_clf.py script to fine-tune LaGoNN classifiers on the ConspirED dataset.

🔧 Command-Line Arguments

Argument	Description
`--model`	HuggingFace model to fine-tune (e.g., `paraphrase-mpnet-base-v2`).
`--model_seed`	Random seed for model initialization.
`--num_iter`	Number of LaGoNN message-passing iterations.
`--epochs`	Number of training epochs.
`--multilab`	Set to `True` for multi-label classification; `False` for single-label.
`--lagonn_config`	Graph configuration (e.g., `LABEL`, `TEXT`, etc.).
`--lagonn_mode`	Name of the experimental setup (e.g., `LAGONN_EXP`).
`--NUM_NEIGHBORS`	Number of neighbors per node in the graph.
`--DISTANCE_PRECISION`	Optional: precision mode for node distance (default: `None`).
`--context`	Whether to include surrounding context in the input.
`--window`	Token window size used when `--context` is enabled.

🚀 Example Usage

python train_clf.py \
  --model paraphrase-mpnet-base-v2 \
  --model_seed 4 \
  --num_iter 17 \
  --epochs 3 \
  --multilab True \
  --lagonn_config LABEL \
  --lagonn_mode LAGONN_EXP \
  --NUM_NEIGHBORS 1 \
  --setfit False \
  --context True \
  --window 1000

💬 Prompting with LLaMA and GPT Models

The main.py script runs prompting experiments to identify conspiratorial traits using LLaMA, GPT, or other LLMs.

🔧 Command-Line Arguments

Argument	Description
`--strategy`	Prompting strategy used (e.g., `what_to_look_for`).
`--icl`	In-context learning setup: `zero_shot`, `few_shot_similar`, `few_shot_dissimilar`, or `few_shot_both`.
`--k`	Number of examples used in few-shot prompting.
`--cot`	Whether to use chain-of-thought prompting (`True` or `False`).
`--dev`	Run on development set instead of test set (`True` or `False`).
`--model`	Path to the local LLaMA model (if not using OpenAI).
`--context`	Whether to include surrounding context in the prompt (`True` or `False`).
`--window`	Token window size for included context (ignored if `--context=False`).
`--openai`	Whether to use OpenAI API models (`True`) or local models (`False`).
`--openai_model`	OpenAI model name (e.g., `gpt-4o`.).

🚀 Example Usage

python main.py \
  --strategy what_to_look_for \
  --icl few_shot_both \
  --k 20 \
  --cot True \
  --dev False \
  --model path/to/llama-2-7b-chat \
  --context True \
  --window 1000 \
  --openai False

📊 Expected Results

After running experiments, results will be automatically saved to disk in JSON format with the following structure:

Standard Evaluation

Results are saved to: llm_trainclf_jsons/{seed}/{model}/{strategy}/{icl}/{k}/{cot}/{dev}/{context}/{window}/{openai}/{openai_model}/

Each experiment produces JSON files containing:

Classification report: Per-class precision, recall, and F1-scores for each conspiracy trait
Aggregated metrics: Macro, micro, samples, and weighted averages for:
- F1-score
- Precision
- Recall

Relaxed Evaluation

Relaxed evaluation files (*_relaxed_results.json) assess whether the model correctly identifies the dominant trait when multiple traits are present. This evaluation considers a prediction correct if the model assigns a probability ≥ 0.5 to the dominant trait, providing a less strict measure of model performance focused on identifying the most salient conspiracy trait in each instance.

Both standard and relaxed evaluation results are printed to the console during execution and saved as JSON files for further analysis.

📄 License

This repository uses dual licensing:

Code: Licensed under the Apache License 2.0 (see LICENSE)
Data: Licensed under Creative Commons Attribution 4.0 International (CC-BY-4.0) (see data/LICENSE-CC-BY-4.0.txt)

📖 Citation

If our work was helpful for your work, please be so kind as to cite us:

@article{bates2025conspired,
  title={ConspirED: A Dataset for Cognitive Traits of Conspiracy Theories and Large Language Model Safety},
  author={Bates, Luke and Glockner, Max and Nakov, Preslav and Gurevych, Iryna},
  journal={arXiv preprint arXiv:2508.20468},
  year={2025},
  url={https://arxiv.org/abs/2508.20468}
}

👥 Contact

Maintainer: Luke Bates ([email protected])
UKP Lab: https://www.ukp.tu-darmstadt.de
TU Darmstadt: https://www.tu-darmstadt.de

Don't hesitate to send us an email or report an issue if something is broken or if you have further questions.

⚠️ Disclaimer

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 ConspirED Dataset Overview

📁 Directory Structure

`data/`

Dataset Fields

⚙️ Environment Setup

✅ Create a Conda environment

LaGoNN (fine-tuning)

Prompting

🔑 API Configuration

🏋️‍♂️ Fine-Tuning LaGoNN

🔧 Command-Line Arguments

🚀 Example Usage

💬 Prompting with LLaMA and GPT Models

🔧 Command-Line Arguments

🚀 Example Usage

📊 Expected Results

Standard Evaluation

Relaxed Evaluation

Both standard and relaxed evaluation results are printed to the console during execution and saved as JSON files for further analysis.

📄 License

📖 Citation

👥 Contact

⚠️ Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github		.github
data		data
static		static
.gitignore		.gitignore
.nojekyll		.nojekyll
LICENSE		LICENSE
NOTICE.txt		NOTICE.txt
README.md		README.md
conspir_tils.py		conspir_tils.py
finetuning_environment.yml		finetuning_environment.yml
index.html		index.html
lagonn.py		lagonn.py
main.py		main.py
prompting_environment.yml		prompting_environment.yml
setup_utils.py		setup_utils.py
train_clf.py		train_clf.py

License

UKPLab/conspired

Folders and files

Latest commit

History

Repository files navigation

🧠 ConspirED Dataset Overview

📁 Directory Structure

data/

Dataset Fields

⚙️ Environment Setup

✅ Create a Conda environment

LaGoNN (fine-tuning)

Prompting

🔑 API Configuration

🏋️‍♂️ Fine-Tuning LaGoNN

🔧 Command-Line Arguments

🚀 Example Usage

💬 Prompting with LLaMA and GPT Models

🔧 Command-Line Arguments

🚀 Example Usage

📊 Expected Results

Standard Evaluation

Relaxed Evaluation

Both standard and relaxed evaluation results are printed to the console during execution and saved as JSON files for further analysis.

📄 License

📖 Citation

👥 Contact

⚠️ Disclaimer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

`data/`

Packages