Local 7B LLM Fine-tuning and GGUF Conversion Pipeline

A complete pipeline for fine-tuning 7B language models using QLoRA and converting them to GGUF format for efficient inference.

Features

QLoRA Fine-tuning: Memory-efficient fine-tuning using 4-bit quantization
GPU/CPU Support: Automatic fallback to CPU if CUDA unavailable
Model Merging: Merge LoRA adapters with base models
GGUF Conversion: Convert to GGUF format with configurable quantization
Local Inference: Test models locally with interactive chat
Complete Pipeline: One-command execution of the entire workflow

Quick Start

Setup Environment
```
python setup.py
```

Create Sample Data

python finetune_pipeline.py --create_sample_data

Run Complete Pipeline

python finetune_pipeline.py --data_path data/sample_train.jsonl

Installation

Install Python Dependencies
```
pip install -r requirements.txt
```

Install llama.cpp (Optional, for GGUF conversion)

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make

Data Format

Training data should be in JSONL format with a "text" field containing formatted prompts:

{"text": "<s>[INST] What is machine learning? [/INST] Machine learning is a subset of artificial intelligence...</s>"}
{"text": "<s>[INST] Explain neural networks. [/INST] Neural networks are computational models...</s>"}

Usage Examples

Basic Fine-tuning

python finetune_pipeline.py --data_path my_data.jsonl

Custom Model and Settings

python finetune_pipeline.py \
  --model_name microsoft/DialoGPT-medium \
  --data_path my_data.jsonl \
  --epochs 5 \
  --batch_size 2 \
  --learning_rate 1e-4

Skip GGUF Conversion

python finetune_pipeline.py --data_path my_data.jsonl --skip_gguf

Individual Components

Fine-tuning Only

python finetune.py --data_path my_data.jsonl

Merge LoRA Adapter

python merge_model.py --adapter_path ./lora_adapters --output_path ./merged_model

Convert to GGUF

python gguf_converter.py --model_path ./merged_model

Test Model

python inference.py --model_path ./merged_model --interactive

Configuration

Edit config.py to customize training parameters:

@dataclass
class TrainingConfig:
    model_name: str = "microsoft/DialoGPT-medium"
    max_seq_length: int = 512
    num_train_epochs: int = 3
    learning_rate: float = 2e-4
    lora_r: int = 64
    lora_alpha: int = 16
    quantization_level: str = "Q4_K_M"

Output Structure

outputs/
├── lora_adapters/          # LoRA adapter files
├── merged_model/           # Merged PyTorch model
└── gguf_models/           # GGUF quantized models

data/
└── sample_train.jsonl     # Sample training data

GPU Requirements

Minimum: 8GB VRAM for 7B model fine-tuning with QLoRA
Recommended: 16GB+ VRAM for optimal performance
CPU Fallback: Available but significantly slower

Supported Models

The pipeline works with most causal language models on Hugging Face:

microsoft/DialoGPT-medium (default)
microsoft/DialoGPT-large
EleutherAI/gpt-neo-1.3B
EleutherAI/gpt-neo-2.7B
And many others...

Quantization Levels

Available GGUF quantization levels:

Q4_0, Q4_1: 4-bit quantization
Q5_0, Q5_1: 5-bit quantization
Q8_0: 8-bit quantization
Q4_K_M, Q5_K_M: K-quantization (recommended)

Troubleshooting

CUDA Out of Memory

Reduce per_device_train_batch_size
Increase gradient_accumulation_steps
Reduce max_seq_length

GGUF Conversion Fails

Install llama.cpp from source
Ensure conversion scripts are in PATH
Use --skip_gguf to bypass conversion

Model Quality Issues

Increase training epochs
Adjust learning rate
Improve training data quality
Increase LoRA rank (lora_r)

License

This project is licensed under the MIT License. See individual model licenses for usage restrictions.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
dialogpt_data		dialogpt_data
.gitignore		.gitignore
README.md		README.md
USAGE.md		USAGE.md
config.py		config.py
convert_to_dialogpt.py		convert_to_dialogpt.py
create_custom_data.py		create_custom_data.py
data_utils.py		data_utils.py
dialogpt_config.py		dialogpt_config.py
dialogpt_conversation_tester.py		dialogpt_conversation_tester.py
dialogpt_finetune_pipeline.py		dialogpt_finetune_pipeline.py
dialogpt_model_loader.py		dialogpt_model_loader.py
dialogpt_tester_clean.py		dialogpt_tester_clean.py
finetune.py		finetune.py
finetune_pipeline.py		finetune_pipeline.py
gguf_converter.py		gguf_converter.py
inference.py		inference.py
merge_model.py		merge_model.py
model_utils.py		model_utils.py
requirements.md		requirements.md
requirements.txt		requirements.txt
setup.py		setup.py
simple_finetune.py		simple_finetune.py
test_better_format.py		test_better_format.py
test_conversational_format.py		test_conversational_format.py
test_finetuned_model.py		test_finetuned_model.py
test_original_model.py		test_original_model.py
test_simple_model.py		test_simple_model.py
verify_cuda.py		verify_cuda.py
verify_gpu_training.py		verify_gpu_training.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Local 7B LLM Fine-tuning and GGUF Conversion Pipeline

Features

Quick Start

Installation

Data Format

Usage Examples

Basic Fine-tuning

Custom Model and Settings

Skip GGUF Conversion

Individual Components

Configuration

Output Structure

GPU Requirements

Supported Models

Quantization Levels

Troubleshooting

License

About

Uh oh!

Releases

Packages

Languages

jchip/lm-finetune2

Folders and files

Latest commit

History

Repository files navigation

Local 7B LLM Fine-tuning and GGUF Conversion Pipeline

Features

Quick Start

Installation

Data Format

Usage Examples

Basic Fine-tuning

Custom Model and Settings

Skip GGUF Conversion

Individual Components

Configuration

Output Structure

GPU Requirements

Supported Models

Quantization Levels

Troubleshooting

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages