Skip to content

NEUIR/P-ALIGN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Long-Chain Reasoning Distillation via Adaptive Prefix Alignment

This repository contains the source code for the paper: Long-Chain Reasoning Distillation via Adaptive Prefix Alignment.

arXiv HuggingFace-Paper HuggingFace-P-ALIGN

• 🎯 Overview • ⚙️ Set Up • 🔧 P-ALIGN Pineline • 📨 Contact

🎯Overview

Prefix-ALIGN is a distillation framework that leverages adaptive prefix alignment to improve student model reasoning. It truncates teacher Chains-of-Thought (CoTs) and selects concise, informative prefixes as supervision, effectively bridging the gap between teacher trajectories and student capacity. P-ALIGN enables student models to learn from high-quality CoTs without being hindered by redundancy or uncertainty.

⚙️Set Up

1. Python Environment.

Use git clone to download this project.

conda create -n P-ALIGN python=3.10
conda activate P-ALIGN
git clone https://github.com/NEUIR/P-ALIGN.git
cd P-ALIGN
pip install -r requirements.txt --force-reinstall --no-deps --no-cache-dir

2. Install LLaMA-Factory.

Refer to https://github.com/hiyouga/LLaMA-Factory for detailed instructions.

conda create -n llama_factory python=3.10
conda activate llama_factory
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch,metrics]"

📊 Data Preparation

1. Raw Training and Evaluation Data

Due to licensing restrictions, we do not redistribute the original training and evaluation datasets.
Please refer to the data/ directory for instructions on how to download and organize the raw data.

data/
├── raw/      # Instructions for obtaining original datasets
└── results/  # Model inference and evaluation results

2. Processed Data Release

To facilitate quick reproduction of our experiments, we release the processed data used in our method on P-ALIGN

🔧P-ALIGN Pipeline

1. Adaptive Prefix Truncation via Binary Search

To extract the most concise yet sufficient reasoning prefix from long chains of thought, we adopt a binary search–based truncation strategy. Specifically, the student model performs self-evaluation over different prefix lengths to determine whether the current prefix is sufficient to solve the problem, allowing us to efficiently identify the minimal effective reasoning prefix.

bash scripts/Prefix_truncation.sh 

2. Prefix-based Alignment

To better align long chains of thought with the reasoning capacity of the student model, we further apply a prefix alignment strategy. This step ensures that the retained prefix not only remains sufficient, but is also well-matched to the student model’s inference ability.

bash scripts/Prefix_alignment.sh 

3.Inference

During inference, we leverage vLLM to enable efficient and accelerated decoding, significantly improving inference speed while maintaining generation quality.

bash scripts/Inference.sh 

4、Evaluation

To validate the effectiveness of our approach, we evaluate model performance by matching generated outputs with ground-truth answers. We report pass@1 and pass@3 as the main evaluation metrics to measure overall reasoning performance.

bash scripts/Evaluation.sh 

📨Contanct

If you have questions, suggestions, and bug reports, please email:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors