Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@
title: Troubleshooting
- local: developer_guides/checkpoint
title: PEFT checkpoint format
- local: developer_guides/method_comparison
title: Method Comparison

- title: 🤗 Accelerate integrations
sections:
Expand Down
68 changes: 68 additions & 0 deletions docs/source/developer_guides/method_comparison.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Method Comparison Guide

This guide provides a comprehensive comparison of different Parameter-Efficient Fine-Tuning (PEFT) methods available in the PEFT library. Each method has its own strengths and is suited for different use cases.

## Available Methods

- [LoRA (Low-Rank Adaptation)](lora.md) - A versatile method that works well across different model sizes
- [LoRA-FA (LoRA with Fast Adaptation)](lora_fa.md) - An enhanced version of LoRA optimized for quick adaptation
- [Bone (Bottleneck Orthogonal Network)](bone.md) - A memory-efficient method particularly suited for small to medium models

## Quick Comparison

| Method | Memory Efficiency | Training Speed | Best For |
|--------|------------------|----------------|----------|
| LoRA | High | Fast | General fine-tuning, large models |
| LoRA-FA | High | Very Fast | Quick adaptation, resource-constrained environments |
| Bone | Very High | Fast | Small to medium models, classification tasks |

## Choosing the Right Method

When selecting a PEFT method, consider the following factors:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit wary of adding a section like this. To me, these conclusion sound too strong compared to the evidence that we have. If you derived these conclusions from some papers (or even meta-reviews), that's of course different, but in that case let's add the references.

Otherwise, I would suggest to strictly stick to the evidence we have. I'm not sure if you ran the PEFT method comparison suite yourself, otherwise I think I can share the results I got from running them locally, even though those are still preliminary.

Just as an example, unless we add a new task to the method comparison suite that is specifically for classification, I would not document that Bone is especially good for classification.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove this section let me know what todo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this section if we don't have evidence corroborating the claims.


1. **Model Size**
- Small models (<1B parameters): Consider Bone
- Medium to large models: Consider LoRA or LoRA-FA

2. **Resource Constraints**
- Limited memory: Bone or LoRA-FA
- Limited training time: LoRA-FA

3. **Task Type**
- Classification: Bone
- Generation: LoRA or LoRA-FA
- Multi-task learning: LoRA

4. **Performance Requirements**
- Fast adaptation: LoRA-FA
- Maximum performance: LoRA
- Memory efficiency: Bone

## Implementation Details

Each method has its own configuration and implementation details. Please refer to the individual method documentation for specific implementation guides:

- [LoRA Implementation Guide](lora.md#implementation)
- [LoRA-FA Implementation Guide](lora_fa.md#implementation)
- [Bone Implementation Guide](bone.md#implementation)

## Performance Metrics

For detailed performance metrics and comparisons, please refer to the individual method documentation. Each method's documentation includes:

- Memory efficiency metrics
- Training performance characteristics
- Use case recommendations
- Hyperparameter tuning guides

## Best Practices

1. Start with LoRA for general use cases
2. Use LoRA-FA when quick adaptation is required
3. Consider Bone for small models or memory-constrained environments
4. Always benchmark performance before committing to a method

## References

- [PEFT Documentation](https://huggingface.co/docs/peft/index)
- [Implementation Guide](https://github.com/huggingface/peft)
195 changes: 195 additions & 0 deletions docs/source/developer_guides/method_comparison/bone.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
# Bone (Bottleneck Orthogonal Network)

## Overview
Bone is a parameter-efficient fine-tuning method that uses orthogonal transformations in bottleneck layers. It's particularly effective for small to medium-sized models and offers excellent memory efficiency.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the descriptions, let's refer to the existing documentation (in this case here) and avoid duplication as much as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing this


## Key Features
- Extremely memory efficient (~0.05% of base model parameters)
- Fast inference speed
- Good for small to medium models
- Simple implementation

## Performance Characteristics

### Memory Efficiency
| Model Size | Bone Parameters | Memory Usage |
|------------|----------------|--------------|
| 100M | ~50K | ~200KB |
| 1B | ~500K | ~2MB |
| 7B | ~3.5M | ~14MB |
| 13B | ~6.5M | ~26MB |

### Training Performance
| Metric | Value |
|--------|-------|
| Training Speed | Fast |
| Convergence | Quick (typically 1-2 epochs) |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we really give general statements like this?

| Inference Overhead | < 2% |

## Use Cases

### Best For
- Small to medium models
- Resource-constrained devices
- Classification tasks
- Quick experiments

### Not Recommended For
- Large language models (>13B parameters)
- Complex generation tasks
- Tasks requiring extensive adaptation

## Implementation

### Basic Usage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, let's keep duplication to a minimum. For instance, we could refer to the Bone example here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still open.

```python
from peft import BoneConfig, get_peft_model

# Define Bone configuration
config = BoneConfig(
bottleneck_size=64, # size of bottleneck layer
target_modules=["attention.output"],
dropout=0.1,
)

# Create PEFT model
model = get_peft_model(model, config)
```

### Advanced Configuration
```python
# Custom Bone configuration
config = BoneConfig(
bottleneck_size=128, # larger bottleneck
target_modules=["attention.output", "intermediate"],
dropout=0.2,
use_orthogonal=True, # enable orthogonal transformations
orthogonal_eps=1e-6, # epsilon for numerical stability
)
```

## Hyperparameter Tuning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you derive the values in this section?


### Recommended Ranges
| Parameter | Recommended Range | Impact |
|-----------|------------------|--------|
| bottleneck_size | 32-256 | Larger = better performance, more parameters |
| dropout | 0.0-0.3 | Regularization |
| orthogonal_eps | 1e-8 to 1e-4 | Numerical stability |

### Optimal Settings by Model Size
| Model Size | Bottleneck Size | Dropout | Orthogonal Eps |
|------------|----------------|---------|----------------|
| < 100M | 32 | 0.1 | 1e-6 |
| 100M-1B | 64 | 0.15 | 1e-6 |
| 1B-7B | 128 | 0.2 | 1e-5 |
| 7B-13B | 256 | 0.25 | 1e-5 |

## Comparison with Other Methods

### Performance Comparison
| Method | Memory Efficiency | Training Speed | Model Size Suitability |
|--------|------------------|----------------|-----------------------|
| Bone | Very High | Fast | Small-Medium |
| LoRA | High | Fast | All |
| Adapter | Medium | Medium | All |
| Prompt | Very High | Very Fast | All |

### Memory Usage Comparison
| Method | Parameters (% of base) | Training Memory | Inference Memory |
|--------|----------------------|-----------------|------------------|
| Bone | 0.05% | Very Low | Very Low |
| LoRA | 0.1% | Low | Low |
| Adapter | 0.5% | Medium | Medium |
| Prompt | 0.01% | Very Low | Very Low |

## Best Practices

1. **Bottleneck Size Selection**
- Start with size 64 for most cases
- Increase for better performance
- Consider model size and task complexity

2. **Target Modules**
- Focus on attention outputs
- Add intermediate layers for complex tasks
- Consider model architecture

3. **Training Tips**
- Use learning rate 5e-5 to 2e-4
- Monitor orthogonal condition
- Use gradient clipping

## Common Issues and Solutions

### Problem: Orthogonal Instability
**Solution:**
```python
# Improve numerical stability
config = BoneConfig(
bottleneck_size=64,
target_modules=["attention.output"],
dropout=0.1,
use_orthogonal=True,
orthogonal_eps=1e-4, # Increase epsilon
)
```

### Problem: Limited Adaptation
**Solution:**
```python
# Increase adaptation capacity
config = BoneConfig(
bottleneck_size=128, # Larger bottleneck
target_modules=["attention.output", "intermediate"], # More target modules
dropout=0.1,
use_orthogonal=True,
)
```

## Examples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need a section for "Examples" and also "Basic Usage". As mentioned above, let's link to the existing examples instead. If those are insufficient, I would much rather like to see those examples extended, e.g. to show a text classification task.


### Text Classification
```python
from transformers import AutoModelForSequenceClassification
from peft import BoneConfig, get_peft_model

# Load base model
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")

# Configure Bone
config = BoneConfig(
bottleneck_size=64,
target_modules=["attention.output"],
dropout=0.1,
use_orthogonal=True,
)

# Create PEFT model
model = get_peft_model(model, config)
```

### Small Model Fine-tuning
```python
from transformers import AutoModelForCausalLM
from peft import BoneConfig, get_peft_model

# Load small base model
model = AutoModelForCausalLM.from_pretrained("gpt2-small")

# Configure Bone
config = BoneConfig(
bottleneck_size=32,
target_modules=["attention.output"],
dropout=0.1,
use_orthogonal=True,
)

# Create PEFT model
model = get_peft_model(model, config)
```

## References
1. [Bone Paper](https://arxiv.org/abs/your-paper-url)
2. [PEFT Documentation](https://huggingface.co/docs/peft/index)
3. [Implementation Guide](https://github.com/huggingface/peft)
Loading
Loading