Skip to content
/ DEER Public

This is the repository of DEER, a Dynamic Early Exit in Reasoning method for Large Reasoning Language Models.

License

Notifications You must be signed in to change notification settings

iie-ycx/DEER

Repository files navigation

DEER 🦌: Dynamic Early Exit in Reasoning Models

arXiv License: MIT Python HuggingFace vLLM

This is the repository of our paper: Dynamic Early Exit in Reasoning Models.

DEER monitors model behavior at potential reasoning transition points and dynamically terminates the next reasoning chain’s generation when the model exhibits high confidence in a trial answer. It is consistently effective on 11 cutting-edge reasoning LLMs of varying series and sizes, reducing the length of CoT sequences by an average of 19.1% - 80.1% while improving accuracy by 0.3% - 5.0%.


🔥 Latest Updates

  • [2025/05/20] Released DEER code for mathematical reasoning tasks (HuggingFace & vLLM).
  • [Coming Soon] DEER for code generation tasks & Branch-Parallel Decoding Acceleration.

🎯 Key Results

Results on 11 reasoning models with 16k token budgets. "Acc" denotes accuracy, "Tok" denotes token count, and "CR" denotes compression rate.

Experimental results presented in bar charts.


🚀 Quick Start

1. Installation

git clone https://github.com/yourusername/DEER.git
cd DEER
pip install -r requirements.txt

2. DEER on vLLM (Recommended)

Considering efficiency, we recommend reproducing the results using the code based on the vLLM framework.

For Most Reasoning Models

CUDA_VISIBLE_DEVICES=1 python ../vllm-deer.py \
    --model_name_or_path "./DeepSeek-R1-Distill-Qwen-14B" \
    --dataset_dir "./data/" \
    --output_path "./outputs" \
    --dataset "math" \
    --threshold 0.95 \
    --max_generated_tokens 16000 \
    --think_ratio 0.6 \
    --batch_size 2000 \
    --policy avg1 \
    --dtype bfloat16 \
    --gpu-memory-utilization 0.9 \ 

or run:

bash ./bashes/bash-vllm-deer.sh.

For Qwen3 Models

CUDA_VISIBLE_DEVICES=1 python ../vllm-deer-qwen3.py \
    --model_name_or_path "./Qwen3-4B" \
    --dataset_dir "./data/" \
    --output_path "./outputs" \
    --dataset "math" \
    --threshold 0.95 \
    --max_generated_tokens 16000 \
    --think_ratio 0.8 \
    --batch_size 2000 \
    --dtype bfloat16 \
    --policy avg2 \
    --gpu-memory-utilization 0.9 \

or run:

bash ./bashes/bash-vllm-deer-qwen3.sh.

In our experiments, we found that Qwen3-series models tend to be over-confident in confidence prediction, so we made some modifications to its implementation.

  • The calculation of answer confidence was changed from arithmetic mean to geometric mean.
  • An additional condition must be satisfied for early exit: the model must generate </think> after the trial answer.

3. DEER on Transformers

For inference using HuggingFace Transformers (without vLLM), run:

bash ./bashes/bash-vanilla-deer.sh

📊 Evaluation

DEER currently supports evaluation on 7 reasoning benchmarks. The rule-based evaluation for these benchmarks is based on the code implementation from the project LIMO.

python ../check.py \
    --model_name_or_path "./DeepSeek-R1-Distill-Qwen-14B" \
    --data_name "math" \
    --generation_path "your_output.jsonl" \

or run

bash ./bashes/bash-check-correct.sh

📜 Citation

If you use DEER in your research, please cite our paper:

@article{yang2025dynamic,
  title={Dynamic Early Exit in Reasoning Models},
  author={Yang, Chenxu and Si, Qingyi and Duan, Yongjie and Zhu, Zheliang and Zhu, Chenyu and Lin, Zheng and Cao, Li and Wang, Weiping},
  journal={arXiv preprint arXiv:2504.15895},
  year={2025}
}

💬 Community

Join our WeChat group for discussions:

About

This is the repository of DEER, a Dynamic Early Exit in Reasoning method for Large Reasoning Language Models.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published