-
Notifications
You must be signed in to change notification settings - Fork 2.1k
method comprision docs #2509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
method comprision docs #2509
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,68 @@ | ||
| # Method Comparison Guide | ||
|
|
||
| This guide provides a comprehensive comparison of different Parameter-Efficient Fine-Tuning (PEFT) methods available in the PEFT library. Each method has its own strengths and is suited for different use cases. | ||
|
|
||
| ## Available Methods | ||
|
|
||
| - [LoRA (Low-Rank Adaptation)](lora.md) - A versatile method that works well across different model sizes | ||
| - [LoRA-FA (LoRA with Fast Adaptation)](lora_fa.md) - An enhanced version of LoRA optimized for quick adaptation | ||
| - [Bone (Bottleneck Orthogonal Network)](bone.md) - A memory-efficient method particularly suited for small to medium models | ||
|
|
||
| ## Quick Comparison | ||
|
|
||
| | Method | Memory Efficiency | Training Speed | Best For | | ||
| |--------|------------------|----------------|----------| | ||
| | LoRA | High | Fast | General fine-tuning, large models | | ||
| | LoRA-FA | High | Very Fast | Quick adaptation, resource-constrained environments | | ||
| | Bone | Very High | Fast | Small to medium models, classification tasks | | ||
ved1beta marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ## Choosing the Right Method | ||
|
|
||
| When selecting a PEFT method, consider the following factors: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm a bit wary of adding a section like this. To me, these conclusion sound too strong compared to the evidence that we have. If you derived these conclusions from some papers (or even meta-reviews), that's of course different, but in that case let's add the references. Otherwise, I would suggest to strictly stick to the evidence we have. I'm not sure if you ran the PEFT method comparison suite yourself, otherwise I think I can share the results I got from running them locally, even though those are still preliminary. Just as an example, unless we add a new task to the method comparison suite that is specifically for classification, I would not document that Bone is especially good for classification.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we can remove this section let me know what todo
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's remove this section if we don't have evidence corroborating the claims. |
||
|
|
||
| 1. **Model Size** | ||
| - Small models (<1B parameters): Consider Bone | ||
| - Medium to large models: Consider LoRA or LoRA-FA | ||
|
|
||
| 2. **Resource Constraints** | ||
| - Limited memory: Bone or LoRA-FA | ||
| - Limited training time: LoRA-FA | ||
|
|
||
| 3. **Task Type** | ||
| - Classification: Bone | ||
| - Generation: LoRA or LoRA-FA | ||
| - Multi-task learning: LoRA | ||
|
|
||
| 4. **Performance Requirements** | ||
| - Fast adaptation: LoRA-FA | ||
| - Maximum performance: LoRA | ||
| - Memory efficiency: Bone | ||
|
|
||
| ## Implementation Details | ||
|
|
||
| Each method has its own configuration and implementation details. Please refer to the individual method documentation for specific implementation guides: | ||
|
|
||
| - [LoRA Implementation Guide](lora.md#implementation) | ||
| - [LoRA-FA Implementation Guide](lora_fa.md#implementation) | ||
| - [Bone Implementation Guide](bone.md#implementation) | ||
|
|
||
| ## Performance Metrics | ||
|
|
||
| For detailed performance metrics and comparisons, please refer to the individual method documentation. Each method's documentation includes: | ||
|
|
||
| - Memory efficiency metrics | ||
| - Training performance characteristics | ||
| - Use case recommendations | ||
| - Hyperparameter tuning guides | ||
|
|
||
| ## Best Practices | ||
|
|
||
| 1. Start with LoRA for general use cases | ||
| 2. Use LoRA-FA when quick adaptation is required | ||
| 3. Consider Bone for small models or memory-constrained environments | ||
| 4. Always benchmark performance before committing to a method | ||
|
|
||
| ## References | ||
|
|
||
| - [PEFT Documentation](https://huggingface.co/docs/peft/index) | ||
| - [Implementation Guide](https://github.com/huggingface/peft) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,195 @@ | ||
| # Bone (Bottleneck Orthogonal Network) | ||
|
|
||
| ## Overview | ||
| Bone is a parameter-efficient fine-tuning method that uses orthogonal transformations in bottleneck layers. It's particularly effective for small to medium-sized models and offers excellent memory efficiency. | ||
|
||
|
|
||
| ## Key Features | ||
| - Extremely memory efficient (~0.05% of base model parameters) | ||
| - Fast inference speed | ||
| - Good for small to medium models | ||
| - Simple implementation | ||
|
|
||
| ## Performance Characteristics | ||
|
|
||
| ### Memory Efficiency | ||
| | Model Size | Bone Parameters | Memory Usage | | ||
| |------------|----------------|--------------| | ||
| | 100M | ~50K | ~200KB | | ||
| | 1B | ~500K | ~2MB | | ||
| | 7B | ~3.5M | ~14MB | | ||
| | 13B | ~6.5M | ~26MB | | ||
ved1beta marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ### Training Performance | ||
| | Metric | Value | | ||
| |--------|-------| | ||
| | Training Speed | Fast | | ||
| | Convergence | Quick (typically 1-2 epochs) | | ||
|
||
| | Inference Overhead | < 2% | | ||
|
|
||
| ## Use Cases | ||
|
|
||
| ### Best For | ||
| - Small to medium models | ||
| - Resource-constrained devices | ||
| - Classification tasks | ||
| - Quick experiments | ||
|
|
||
| ### Not Recommended For | ||
| - Large language models (>13B parameters) | ||
| - Complex generation tasks | ||
| - Tasks requiring extensive adaptation | ||
|
|
||
| ## Implementation | ||
|
|
||
| ### Basic Usage | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Again, let's keep duplication to a minimum. For instance, we could refer to the Bone example here.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. on it
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is still open. |
||
| ```python | ||
| from peft import BoneConfig, get_peft_model | ||
|
|
||
| # Define Bone configuration | ||
| config = BoneConfig( | ||
| bottleneck_size=64, # size of bottleneck layer | ||
| target_modules=["attention.output"], | ||
| dropout=0.1, | ||
| ) | ||
|
|
||
| # Create PEFT model | ||
| model = get_peft_model(model, config) | ||
| ``` | ||
|
|
||
| ### Advanced Configuration | ||
| ```python | ||
| # Custom Bone configuration | ||
| config = BoneConfig( | ||
| bottleneck_size=128, # larger bottleneck | ||
| target_modules=["attention.output", "intermediate"], | ||
| dropout=0.2, | ||
| use_orthogonal=True, # enable orthogonal transformations | ||
| orthogonal_eps=1e-6, # epsilon for numerical stability | ||
| ) | ||
| ``` | ||
|
|
||
| ## Hyperparameter Tuning | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How did you derive the values in this section? |
||
|
|
||
| ### Recommended Ranges | ||
| | Parameter | Recommended Range | Impact | | ||
| |-----------|------------------|--------| | ||
| | bottleneck_size | 32-256 | Larger = better performance, more parameters | | ||
| | dropout | 0.0-0.3 | Regularization | | ||
| | orthogonal_eps | 1e-8 to 1e-4 | Numerical stability | | ||
|
|
||
| ### Optimal Settings by Model Size | ||
| | Model Size | Bottleneck Size | Dropout | Orthogonal Eps | | ||
| |------------|----------------|---------|----------------| | ||
| | < 100M | 32 | 0.1 | 1e-6 | | ||
| | 100M-1B | 64 | 0.15 | 1e-6 | | ||
| | 1B-7B | 128 | 0.2 | 1e-5 | | ||
| | 7B-13B | 256 | 0.25 | 1e-5 | | ||
|
|
||
| ## Comparison with Other Methods | ||
|
|
||
| ### Performance Comparison | ||
| | Method | Memory Efficiency | Training Speed | Model Size Suitability | | ||
| |--------|------------------|----------------|-----------------------| | ||
| | Bone | Very High | Fast | Small-Medium | | ||
| | LoRA | High | Fast | All | | ||
| | Adapter | Medium | Medium | All | | ||
| | Prompt | Very High | Very Fast | All | | ||
ved1beta marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ### Memory Usage Comparison | ||
| | Method | Parameters (% of base) | Training Memory | Inference Memory | | ||
| |--------|----------------------|-----------------|------------------| | ||
| | Bone | 0.05% | Very Low | Very Low | | ||
| | LoRA | 0.1% | Low | Low | | ||
| | Adapter | 0.5% | Medium | Medium | | ||
| | Prompt | 0.01% | Very Low | Very Low | | ||
|
|
||
| ## Best Practices | ||
|
|
||
| 1. **Bottleneck Size Selection** | ||
| - Start with size 64 for most cases | ||
| - Increase for better performance | ||
| - Consider model size and task complexity | ||
|
|
||
| 2. **Target Modules** | ||
| - Focus on attention outputs | ||
| - Add intermediate layers for complex tasks | ||
| - Consider model architecture | ||
|
|
||
| 3. **Training Tips** | ||
| - Use learning rate 5e-5 to 2e-4 | ||
| - Monitor orthogonal condition | ||
ved1beta marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| - Use gradient clipping | ||
|
|
||
| ## Common Issues and Solutions | ||
|
|
||
| ### Problem: Orthogonal Instability | ||
| **Solution:** | ||
| ```python | ||
| # Improve numerical stability | ||
| config = BoneConfig( | ||
| bottleneck_size=64, | ||
| target_modules=["attention.output"], | ||
| dropout=0.1, | ||
| use_orthogonal=True, | ||
| orthogonal_eps=1e-4, # Increase epsilon | ||
| ) | ||
| ``` | ||
|
|
||
| ### Problem: Limited Adaptation | ||
| **Solution:** | ||
| ```python | ||
| # Increase adaptation capacity | ||
| config = BoneConfig( | ||
| bottleneck_size=128, # Larger bottleneck | ||
| target_modules=["attention.output", "intermediate"], # More target modules | ||
| dropout=0.1, | ||
| use_orthogonal=True, | ||
| ) | ||
| ``` | ||
|
|
||
| ## Examples | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we need a section for "Examples" and also "Basic Usage". As mentioned above, let's link to the existing examples instead. If those are insufficient, I would much rather like to see those examples extended, e.g. to show a text classification task. |
||
|
|
||
| ### Text Classification | ||
| ```python | ||
| from transformers import AutoModelForSequenceClassification | ||
| from peft import BoneConfig, get_peft_model | ||
|
|
||
| # Load base model | ||
| model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased") | ||
|
|
||
| # Configure Bone | ||
| config = BoneConfig( | ||
| bottleneck_size=64, | ||
| target_modules=["attention.output"], | ||
| dropout=0.1, | ||
| use_orthogonal=True, | ||
ved1beta marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| ) | ||
|
|
||
| # Create PEFT model | ||
| model = get_peft_model(model, config) | ||
| ``` | ||
|
|
||
| ### Small Model Fine-tuning | ||
| ```python | ||
| from transformers import AutoModelForCausalLM | ||
| from peft import BoneConfig, get_peft_model | ||
|
|
||
| # Load small base model | ||
| model = AutoModelForCausalLM.from_pretrained("gpt2-small") | ||
|
|
||
| # Configure Bone | ||
| config = BoneConfig( | ||
| bottleneck_size=32, | ||
| target_modules=["attention.output"], | ||
| dropout=0.1, | ||
| use_orthogonal=True, | ||
| ) | ||
|
|
||
| # Create PEFT model | ||
| model = get_peft_model(model, config) | ||
| ``` | ||
|
|
||
| ## References | ||
| 1. [Bone Paper](https://arxiv.org/abs/your-paper-url) | ||
| 2. [PEFT Documentation](https://huggingface.co/docs/peft/index) | ||
| 3. [Implementation Guide](https://github.com/huggingface/peft) | ||
Uh oh!
There was an error while loading. Please reload this page.