Beyond Naive Merging: Enhancing LLM Compression via Alpha Optimization, Task-Specific Similarity, and Neural Alignment
This is a course project for CS 613: Natural Language Processing at IIT Gandhinagar.
Base Paper: Liu, D., et al. (2024). Pruning via merging: Compressing LLMs via manifold alignment based layer merging. arXiv:2406.16330.
This Repository:
https://github.com/Jain-Laksh/Layer-Merging-via-Manifold-Alignment
While Large Language Models (LLMs) have demonstrated remarkable capabilities, their massive size presents significant deployment challenges. This project builds on the "Pruning via Merging" (MKA) paper, which proposes a novel compression technique by merging similar layers based on manifold alignment.
The original MKA method, however, relies on several potentially suboptimal heuristics. This project investigates and provides solutions for three key limitations of the MKA baseline, using the Llama3-8B model as our testbed.
Our three main contributions are:
-
Optimizing Merge Weight (
$\alpha$ ): The MKA baseline sets the crucial layer merge weight ($\alpha$ ) using a simple similarity score. We treat$\alpha$ as a trainable parameter and optimize it using gradient descent and Bayesian optimization to find a more effective value. - Task-Specific Similarity: The baseline assumes a static, task-independent layer similarity. We demonstrate that layer similarity is not static but is highly dependent on the task domain (e.g., MMLU humanities vs. math) and language (e.g., English vs. Spanish vs. Chinese).
- Neural Alignment: The baseline "naively averages" layer weights, which can merge functionally distinct neurons and degrade performance. We implement a robust "align-then-merge" pipeline that uses optimal permutation (via the Hungarian algorithm) to align functionally equivalent neurons before averaging them.
For a detailed background on the project's motivation and the MKA baseline, please see this report . The full methodology and results of our three experiments are in NLP_Assignment_2_Report.pdf.
Our experiments produced three primary findings:
We found that the MKA paper's simple heuristic (
The learned
We quantitatively confirmed our hypothesis that layer similarity is not static.
- Task-Dependence: We generated similarity heatmaps for different MMLU domains and found clear visual and quantitative differences. For instance, the similarity patterns for "Math" and "Computer Science" were highly correlated (0.951), while "Legal" and "Humanities" also showed similar patterns.
- Language-Dependence: The effect was even more pronounced for language. When analyzing the same "medical" task in different languages, the similarity patterns diverged significantly. The correlation between Spanish (es) and Chinese (zh) similarity matrices was only 0.730, confirming that layer redundancy is sensitive to the input language.
Our "align-then-merge" pipeline (Experiment 3) proved to be a more robust and reliable method for layer fusion.
The MKA baseline's performance was highly volatile, suffering a catastrophic drop in accuracy (from 0.662 to 0.547) at 12.5% compression. In contrast, our alignment method showed a smooth, predictable degradation and avoided this intermediate drop.
At that 12.5% compression level, our neural alignment method outperformed the MKA baseline by +0.0864 MMLU accuracy (0.6334 vs 0.5470). At higher compression ratios (40.625%), both methods converged to a similar accuracy, but our method's stability makes it a more reliable approach.
This repository contains all code and reports for the project.
-
/Updating Alpha/: Code for Experiment 1. Contains scripts to optimize the$\alpha$ parameter using Gradient Descent (optimize_alphas.py) and Bayesian Optimization (alpha_pipeline.py). -
/data_dependent_merging/: Code for Experiment 2. Includes notebooks (mmlu_and_multilingual.ipynb) and scripts (compute_similarity.py) to generate and analyze task- and language-specific similarity heatmaps. -
/Neural Alignment/: Code for Experiment 3. Implements the full "align-then-merge" pipeline (pipeline.py) and evaluation notebook (evaluate.ipynb). -
/data/(within experiment folders): Contains MMLU dev and test data samples used for calibration and evaluation. -
NLP_Assignment_2_Report.pdf: The final project report detailing the methodology and results of our three experiments.
- Aditya Borate
- Aryan Solanki
- Laksh Jain
- Nishchay Bhutoria
- Parthiv Patel
- Rudra Pratap Singh
- Soham Gaonkar
We would like to thank our professor Dr. Mayank Singh and our project mentor Sailesh Panda for their invaluable guidance, weekly feedback, and insightful suggestions, which significantly shaped the direction of this work. We also acknowledge the authors of the MKA paper for providing a strong and interesting foundation for our research.