Can Large Language Model Comprehend Ancient Chinese? A Preliminary Test on ACLUE https://arxiv.org/abs/2310.09550
The Ancient Chinese Language Understanding Evaluation (ACLUE) is an evaluation benchmark focused on ancient Chinese language comprehension. It aims to assess the performance of large-scale language models on understanding ancient Chinese. The benchmark comprises 15 tasks spanning various domains, including lexical, syntactic, semantic, inference, and knowledge. ACLUE's tasks are derived from a combination of manually curated questions from publicly available resources, and automatically generated questions from classical Chinese language corpora. The range of questions span from the Xia dynasty (2070 BCE) to the Ming dynasty (1368 CE). ACLUE adopts a multiple-choice question format for all tasks.
Homepage: https://github.com/isen-zhang/ACLUE
@inproceedings{zhang-li-2023-large,
title = "Can Large Language Model Comprehend {A}ncient {C}hinese? A Preliminary Test on {ACLUE}",
author = "Zhang, Yixuan and Li, Haonan",
booktitle = "Proceedings of the Ancient Language Processing Workshop",
month = sep,
year = "2023",
address = "Varna, Bulgaria",
publisher = "INCOMA Ltd., Shoumen, Bulgaria",
url = "https://aclanthology.org/2023.alp-1.9",
pages = "80--87"
}
aclue
: All 15 subjects of the ACLUE dataset, evaluated following the methodology in CMMLU's original implementation.
The following tasks evaluate subjects in the ACLUE dataset using loglikelihood-based multiple-choice scoring:
aclue_{subject_english}
- Is the task an existing benchmark in the literature?
- Have you referenced the original paper that introduced the task?
- If yes, does the original paper provide a reference implementation?
- Yes, original implementation contributed by author of the benchmark
If other tasks on this dataset are already supported:
- Is the "Main" variant of this task clearly denoted?
- Have you provided a short sentence in a README on what each new variant adds / evaluates?
- Have you noted which, if any, published evaluation setups are matched by this variant?