RL-for-LLMs-Optimization

This repository explores Reinforcement Learning (RL) strategies to optimize Large Language Models (LLMs) with a focus on efficiency and resource savings. The project demonstrates both prompt-level and model-level optimization via pruning and quantization, using a combination of custom RL environments and agents.

Prompt_prunning.ipynb: RL-driven approach to prompt pruning for LLMs.
rl-llm-optimization-hybridApproach.ipynb: Hybrid RL environment combining pruning and quantization for compressing LLMs.
rl-llm-prunning.ipynb: RL-based attention head pruning in transformer models for size and latency reduction.

Project Overview

1. Prompt Pruning (`Prompt_prunning.ipynb`)

Focus: Uses RL to prune tokens from prompts fed to LLMs, with the goal of reducing computation while preserving output quality.
Approach:
- Extracts features such as token saliency, attention entropy, and reconstruction error.
- An RL agent learns to select the most critical tokens, balancing resource savings and output similarity.
Techniques: Deep RL, autoencoders for feature extraction, reward engineering combining speed, memory, and output similarity.

2. Hybrid Model Compression (`rl-llm-optimization-hybridApproach.ipynb`)

Focus: Optimizes LLMs by learning both pruning rates and quantization bit-widths per layer with RL.
Approach:
- Custom Gymnasium environment simulates incremental model compression.
- PPO agent (PyTorch) learns policies to maximize compression while minimizing loss in performance.
- Tracks and rewards based on perplexity, FLOPs, parameter count, and overall resource savings.
Techniques: Pruning (magnitude-based), quantization (min-max), curriculum learning, experiment tracking (WandB).

3. Attention Head Pruning (`rl-llm-prunning.ipynb`)

Focus: Uses RL to determine optimal numbers of attention heads to keep per transformer layer, trading off between model quality and efficiency.
Approach:
- Defines a Gym environment as a "game" where the agent's actions are pruning decisions.
- Rewards combine perplexity (quality) with latency and memory savings.
- Trains policy and value networks with PPO; includes tools to prune, save, and evaluate compressed models.
Techniques: PPO, custom reward engineering, evaluation via perplexity.

Installation

pip install torch transformers datasets sentencepiece gymnasium matplotlib
# (Optional for hybrid compression) pip install wandb

Some notebooks require GPU and CUDA drivers for full functionality.

Usage

Clone the repository:

git clone https://github.com/larbi1512/RL-for-LLms-optimization.git
cd RL-for-LLms-optimization

Open the notebook of interest (.ipynb) in Jupyter or VS Code and follow the step-by-step instructions.

For prompt pruning, see Prompt_prunning.ipynb.
For hybrid model compression, see rl-llm-optimization-hybridApproach.ipynb.
For attention head pruning, see rl-llm-prunning.ipynb.

Requirements

Python 3.8+
PyTorch
HuggingFace Transformers
Datasets
Gymnasium
Matplotlib
(Optional) WandB, CUDA-capable GPU

License

See the LICENSE file for details.

Acknowledgements

HuggingFace for model and dataset APIs.
OpenAI Gymnasium for RL environment scaffolding.
The open-source RL and transformers communities.

Contact

For questions, please open an issue or reach out to the repository maintainer.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
Prompt_prunning.ipynb		Prompt_prunning.ipynb
README.md		README.md
quantrl.ipynb		quantrl.ipynb
rl-llm-optimization-hybridApproach.ipynb		rl-llm-optimization-hybridApproach.ipynb
rl-llm-prunning.ipynb		rl-llm-prunning.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RL-for-LLMs-Optimization

Contents

Project Overview

1. Prompt Pruning (`Prompt_prunning.ipynb`)

2. Hybrid Model Compression (`rl-llm-optimization-hybridApproach.ipynb`)

3. Attention Head Pruning (`rl-llm-prunning.ipynb`)

Installation

Usage

Requirements

License

Acknowledgements

Contact

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

larbi1512/RL-for-LLms-optimization

Folders and files

Latest commit

History

Repository files navigation

RL-for-LLMs-Optimization

Contents

Project Overview

1. Prompt Pruning (Prompt_prunning.ipynb)

2. Hybrid Model Compression (rl-llm-optimization-hybridApproach.ipynb)

3. Attention Head Pruning (rl-llm-prunning.ipynb)

Installation

Usage

Requirements

License

Acknowledgements

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

1. Prompt Pruning (`Prompt_prunning.ipynb`)

2. Hybrid Model Compression (`rl-llm-optimization-hybridApproach.ipynb`)

3. Attention Head Pruning (`rl-llm-prunning.ipynb`)

Packages