Skip to content

DAMO-NLP-SG/LLM-Multilingual-Knowledge-Boundaries

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations

Chenghao Xiao1,2,  Hou Pong Chan1,#,  Hao Zhang1,#,  Mahani Aljunied1, 
Lidong Bing1,  Noura Al Moubayed2,  Yu Rong1
1DAMO Academy, Alibaba Group, 2Durham University
#Corresponding Authors

🌟This repo contains the code and datasets for the paper "Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations" to appear at ACL 2025.

🎉 Updates

  • [2025-05] Our paper is accepted by ACL 2025.
  • [2025-04] Check out our paper on arxiv.

Overview

We present the first study to analyze how LLMs recognize knowledge boundaries across different languages by probing their internal representations when processing known and unknown questions in multiple languages.

Our empirical studies reveal three key findings: 1) LLMs' perceptions of knowledge boundaries are encoded in the middle to middle-upper layers across different languages. 2) Language differences in knowledge boundary perception follow a linear structure, which motivates our proposal of a training-free alignment method that effectively transfers knowledge boundary perception ability across languages, thereby helping reduce hallucination risk in low-resource languages; 3) Fine-tuning on bilingual question pair translation further enhances LLMs' recognition of knowledge boundaries across languages.

Given the absence of standard testbeds for cross-lingual knowledge boundary analysis, we construct a multilingual evaluation suite comprising three representative types of knowledge boundary data.

NAME

Evaluation Suite

Links to our datasets: FreshQA-multilingual; FreshQA-multilingual-augmented; True-False-multilingual; SeaRefuse

Inference Code

Code for linear probe, and using mean-shifting & linear projection to align language subspaces.

python inference.py \
    --model_name Qwen/Qwen2.5-7B \
    --dataset_name SeaLLMs/FreshQA-multilingual \
    --output_path "./transferability_results/7B/Qwen_base_7B.json" \
    --methods "identical" "mean shifting" "linear projection" \
    --use_template True \
    --batch_size 50

Citation

@inproceedings{xiao2025analyzingllmsknowledgeboundary,
      title={Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations}, 
      author={Chenghao Xiao and Hou Pong Chan and Hao Zhang and Mahani Aljunied and Lidong Bing and Noura Al Moubayed and Yu Rong},
    booktitle = "Proceedings of the Annual Meeting of the Association for Computational Linguistics ({ACL})",
    month = {July},
    year = "2025",
    url={https://arxiv.org/abs/2504.13816}, 
    publisher = "Association for Computational Linguistics",
}

About

[ACL 2025] Analyzing LLMs' Multilingual Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •