News | Dataset | Important Dates | Baselines | Organizers | Contacts
As large language models (LLMs) like GPT-4o, Claude 3.5, and Gemini 1.5-pro become increasingly accessible, machine-generated content is proliferating across diverse domains, including news, social media, education, and academia. These models produce highly fluent and coherent text, making them valuable for automating various writing tasks. However, their widespread use also raises concerns about misinformation, academic integrity, and content authenticity. Identifying the degree of human and machine involvement in text creation is crucial for addressing these challenges.
In this shared task, we focus on Human-AI Collaborative Text Classification, where the goal is to categorize documents that have been co-authored by humans and LLMs. Specifically, we aim to classify texts into six distinct categories based on the nature of human and machine contributions:
- Fully human-written: The document is entirely authored by a human without any AI assistance.
- Human-initiated, then machine-continued: A human starts writing, and an AI model completes the text.
- Human-written, then machine-polished: The text is initially written by a human but later refined or edited by an AI model.
- Machine-written, then machine-humanized (obfuscated): An AI generates the text, which is later modified to obscure its machine origin.
- Machine-written, then human-edited: The content is generated by an AI but subsequently edited or refined by a human.
- Deeply-mixed text: The document contains interwoven sections written by both humans and AI, without a clear separation.
Accurately distinguishing between these categories will enhance our understanding of human-AI collaboration and help mitigate the risks associated with synthetic text.
We apologise for the delay and incidents with CodaLab. We are happy to release the results of the competition, kindly find them at LeaderBoard.
We sincerely apologize for the delay in releasing the test set. To ensure all teams have sufficient time, we have decided to extend the submission deadline to 24 May (AOE time).
Please note that due to an issue on CodaLab: no option to add a new testing phase to an existing competition, Test predictions must be submitted to a new competition link.
- 📥 Download the test set: Google Drive
- 📤 Check submission instructions
- 📤 Submit your predicted labels: Test-stage CodaLab
Thank you for your understanding and continued participation.
📢 Update on Test Set Release Delay
We sincerely apologize for the delay in releasing the test set. We are currently finalizing the data and conducting a final quality check to ensure its reliability. The new release date is May 20, allowing participants to concentrate on the upcoming EMNLP deadline in the meantime.
Thank you for your understanding and patience!
We have set up the submission platform, please submit your predictions to CodaLab: Submit
We have released our training and dev set.
Participants should submit the predicted labels in a .jsonl
file named as follows:
[team_name]_[submission_date].jsonl
Each line in the file should be a JSON object in the following format:
{"id": "identifier of the test sample", "label": 1}
{"id": "identifier of the test sample", "label": 3}
label
should be an integer from0
to5
, i.e.,[0, 1, 2, 3, 4, 5]
.- Below is the mapping from label IDs to class names:
id2label = {
0: "fully human-written",
1: "human-written, then machine-polished",
2: "machine-written, then machine-humanized",
3: "human-initiated, then machine-continued",
4: "deeply-mixed text; where some parts are written by a human and some are generated by a machine",
5: "machine-written, then human-edited"
}
✅ Check your submission format using format_checker.py
.
This script verifies that the format is correct and provides warnings about possible errors.
Download the training and dev sets by Google Drive.
Label Category | Train | Dev |
---|---|---|
Machine-written, then machine-humanized | 91,232 | 10,137 |
Human-written, then machine-polished | 95,398 | 12,289 |
Fully human-written | 75,270 | 12,330 |
Human-initiated, then machine-continued | 10,740 | 37,170 |
Deeply-mixed text (human + machine parts) | 14,910 | 225 |
Machine-written, then human-edited | 1,368 | 510 |
Total | 288,918 | 72,661 |
All dates are AoE.
- February 17, 2025: Training/dev set release
- May 20, 2025: Test set release
- May 24, 2025: Final submission deadline
- May 30, 2025: Participant paper submission
- June 27, 2025: Peer review notification
python baseline.py --train_file_path subtask2_train.jsonl --dev_file_path subtask2_dev.jsonl --test_file_path subtask2_dev.jsonl --model roberta-base --prediction_file_path clef_prediction.csv
RoBERTa-base results:
- Accuracy: 56.71%
- Macro F1: 61.26%
- Macro Recall: 68.67% (Oficial metric)
- Preslav Nakov, Mohamed bin Zayed University of Artificial Intelligence, UAE
- Iryna Gurevych, Mohamed bin Zayed University of Artificial Intelligence, UAE; Technical University of Darmstadt, Germany
- Nizar Habash, New York University Abu Dhabi, UAE
- Alham Fikri Aji, Mohamed bin Zayed University of Artificial Intelligence, UAE
- Yuxia Wang, Mohamed bin Zayed University of Artificial Intelligence, UAE
- Artem Shelmanov, Mohamed bin Zayed University of Artificial Intelligence, UAE
- Ekaterina Artemova, Toloka AI, Netherlands
- Jonibek Mansurov, Mohamed bin Zayed University of Artificial Intelligence, UAE
- Zhuohan Xie, Mohamed bin Zayed University of Artificial Intelligence, UAE
- Jinyan Su, Cornell University, USA
- Akim Tsvigun, Nebius AI, Netherlands
- Rui Xing, Mohamed bin Zayed University of Artificial Intelligence, UAE
Emails: [email protected], [email protected], [email protected], [email protected]