Awesome-Knowledge-Fusion

If you have any questions about the library, please feel free to contact us. Email: [email protected]

A comprehensive list of papers about '[Knowledge Fusion: The Integration of Model Capabilities.]'.

Abstract

As the comprehensive capabilities of foundational large models rapidly improve, similar general abilities have emerged across different models, making capability transfer and fusion between them more feasible. Knowledge fusion aims to integrate existing LLMs of diverse architectures and capabilities into a more powerful model through efficient methods such as knowledge distillation, model merging, mixture of experts, and PEFT, thereby reducing the need for costly LLM development and adaptation. We provide a comprehensive overview of model merging methods and theories, covering their applications across various fields and scenarios, including LLMs, MLLMs, image generation, model compression, continual learning, and more. Finally, we highlight the challenges of knowledge fusion and explore future research directions.

Framework

Awesome-Knowledge-Fuse

1. Connectivity and Alignment

1.1 Model Connectivity

Paper Title	Pub & Date	Key Word
Fine-Tuning Linear Layers Only Is a Simple yet Effective Way for Task Arithmetic	2024	Arxiv
Tangent Transformers for Composition,Privacy and Removal	2024	ICLR
Parameter Efficient Multi-task Model Fusion with Partial Linearization	2024	ICLR
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models	2023	NeurIPS

1.2 Weight Alignment

Paper Title	Year	Conference/Journal
Equivariant Deep Weight Space Alignment	2024	ICML
Harmony in diversity: Merging neural networks with canonical correlation analysis	2024	ICML
Transformer fusion with optimal transport	2024	ICLR
Layerwise linear mode connectivity	2024	ICLR
Proving linear mode connectivity of neural networks via optimal transport	2024	AISTATS
Training-Free Pretrained Model Merging	2024	CVPR
Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering	2024	Arxiv
C2M3: Cycle-Consistent Multi Model Merging	2024	Arxiv
Rethink Model Re-Basin and the Linear Mode Connectivity	2024	Arxiv
Git Re-Basin: Merging Models modulo Permutation Symmetries	2023	ICLR
Re-basin via implicit Sinkhorn differentiation	2023	CVPR
Plateau in Monotonic Linear Interpolation--A "Biased" View of Loss Landscape for Deep Networks	2023	ICLR
Linear Mode Connectivity of Deep Neural Networks via Permutation Invariance and Renormalization	2023	ICLR
REPAIR: REnormalizing Permuted Activations for Interpolation Repair	2023	ICLR
Going beyond linear mode connectivity: The layerwise linear feature connectivity	2023	NeurIPS
The role of permutation invariance in linear mode connectivity of neural networks	2022	ICLR
What can linear interpolation of neural network loss landscapes tell us?	2022	ICML
Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling	2021	ICML
Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes	2021	ICML
Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances	2021	ICML
Linear Mode Connectivity and the Lottery Ticket Hypothesis	2020	ICML
Optimizing mode connectivity via neuron alignment	2020	NeurIPS
Model fusion via optimal transport	2020	NeurIPS
Uniform convergence may be unable to explain generalization in deep learning	2019	NeurIPS
Explaining landscape connectivity of low-cost solutions for multilayer nets	2019	NeurIPS
Essentially no barriers in neural network energy landscape	2018	ICML
Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs	2018	NeurIPS
Weight Scope Alignment: A Frustratingly Easy Method for Model Merging	2024	Arxiv

2. Parameter Merging

2.1 Merging Methods

Gradient based

Paper Title	Year	Conference/Journal
Composing parameter-efficient modules with arithmetic operation	2023	NeurIPS
Editing models with task arithmetic	2023	ICLR
Model fusion via optimal transport	2020	NeurIPS
Weight averaging for neural networks and local resampling schemes	1996	AAAI Workshop

Task Vector based

Paper Title	Year	Conference/Journal
Knowledge Composition using Task Vectors with Learned Anisotropic Scaling	2024	Arxiv
MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic	2024	Arxiv
Checkpoint Merging via Bayesian Optimization in LLM Pretraining	2024	Arxiv
Arcee’s MergeKit: A Toolkit for Merging Large Language Models	2024	Arxiv
Evolutionary optimization of model merging recipes	2024	Arxiv
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts	2024	ACL
AdaMerging: Adaptive Model Merging for Multi-Task Learning	2024	ICLR
Model Merging by Uncertainty-Based Gradient Matching	2024	ICLR
Merging by Matching Models in Task Subspaces	2024	TMLR
Fisher Mask Nodes for Language Model Merging	2024	LREC-COLING
Erasure Coded Neural Network Inference via Fisher Averaging	2024	ISIT
Dataless Knowledge Fusion by Merging Weights of Language Models	2023	ICLR
Merging models with fisher-weighted averaging	2022	NeurIPS

2.2 During or After Training

During Training

Paper Title	Year	Conference/Journal
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch	2024	ICML
Localizing Task Information for Improved Model Merging and Compression	2024	ICML
Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging	2024	ICLR
Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic	2024	Arxiv
Activated Parameter Locating via Causal Intervention for Model Merging	2024	Arxiv
PAFT: A Parallel Training Paradigm for Effective LLM Fine-Tuning	2024	Arxiv
DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling	2024	Arxiv
EMR-Merging: Tuning-Free High-Performance Model Merging	2024	Arxiv
Model breadcrumbs: Scaling multi-task model merging with sparse masks	2023	Arxiv
Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion	2023	Arxiv
Resolving Interference When Merging Models	2023	NeurIPS
Task-Specific Skill Localization in Fine-tuned Language Model	2023	ICML

After Training

2.3 For LLMs and MLLMs

For LLMs

For MLLMs

3. Model Ensemble

3.1 Ensemble Methods

Weighted Averaging

Routing

Voting

3.2 Ensemble Object

Entire Model

Adapter

4. Decouple and Reuse

4.1 Reprogramming

4.2 Mask

5. Distillation

5.1 Transformer

5.2 CNN

5.3 GNN

6. Model Reassemble

6.1 Model Stitch

6.2 Model Evolution

7. Others

7.1 External Data Retrieval

7.2 Other Surveys

Star History

Contact

We invite all researchers to contribute to this repository, 'Knowledge Fusion: The Integration of Model Capabilities'. If you have any questions about the library, please feel free to contact us.

Email: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
imgs		imgs
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md

License

Murphyish/Awesome-Knowledge-Fusion

Folders and files

Latest commit

History

Repository files navigation