DeepSeek R1模型Lora微调

down load model weights and dataset

python hf_download.py --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --save_dir ./models
python hf_download.py --dataset FreedomIntelligence/medical-o1-reasoning-SFT --save_dir ./data

training env config

conda create -n deepseek python=3.12
conda activate deepseek

pip install unsloth
pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

pip install vllm
pip install wandb
pip install streamlit
pip install isort
pip install black

or

pip install -r requirements.txt

vllm inference env config

conda create -n vllm python=3.12
conda activate vllm

pip install vllm==0.8.1
pip install streamlit
pip install isort
pip install black

sglang inference env config

conda create -n sglang python=3.12
conda activate sglang

pip install sglang[all]
pip install streamlit
pip install isort
pip install black

visulize training history

wandb login

supervised finetune

python train.py

inference

python chat.py

start vllm server

start server

vllm serve /path/to/DeepSeek-R1-Distill-Qwen-32B --tensor-parallel-size 1 --max-model-len 32768 --enforce-eager

access remote server by following command

curl http://localhost:9000/v1/chat/completions -H "Content-Type: application/json" -d '{
    "model": "/path/to/DeepSeek-R1-Distill-Qwen-32B",
    "messages": [{"role": "user", "content": "Hello! What is your name?"}]
  }'

start sglang server

start server

python -m sglang.launch_server --model /path/to/DeepSeek-R1-Distill-Qwen-32B --dp 1 --tp 1 --nnodes 1 --trust-remote-code

access remote server by following command

curl http://localhost:9000/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "path/to/your/gguf_model.gguf",
    "messages": [{"role": "user", "content": "Hello! What is your name?"}]
  }'

start llama.cpp server

start server

./llama-server -m path/to/your/gguf_model.gguf --port 8000

access remote server by following command

curl http://localhost:9000/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "path/to/your/gguf_model.gguf",
    "messages": [{"role": "user", "content": "Hello! What is your name?"}]
  }'

start web-ui

streamlit run app.py

you can also start simple web-ui by following command:

streamlit run app_demo.py

open the webpage and input your question

merge lora

python ./merge_lora.py --model_dir path/to/your/base_model_folder --lora_adapter_dir path/to/your/lora_adapter_folder --max_seq_length 32768 --torch_dtype auto --save_model_dir /path/to/your/lora_mergerd_model_folder --save_method merged_16bit

merge lora, quantize and deploy

clone and compile llama.cpp

git clone https://github.com/ggml-org/llama.cpp

CPU Build

Build llama.cpp using CMake:

cmake -B build
cmake --build build --config Release

For faster compilation, add the -j argument to run multiple jobs in parallel, or use a generator that does this automatically such as Ninja. For example, cmake --build build --config Release -j 8 will run 8 jobs in parallel.
For faster-repeated compilation, install ccache

CUDA Build

Plz install CUDA before you begin building llama.cpp

cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release

Move the compiled bin file

cd build
mv ./bin/llama-quantize ../
mv ./bin/llama-server ../

If you encounter any error in the build procedure, please reference the official doc llama.cpp_build

merge base model with lora adpater

python merge_lora.py --model_dir /path/to/base_model --lora_adapter_dir /path/to/lora_adapter --max_seq_length 32768 --torch_dtype bfloat16 --save_model_dir /path/to/target_dir --save_method merged_16bit

merge base model with lora adpater and quant to gguf format

python ./merge_lora_quant_to_gguf.py --model_dir path/to/your/base_model_folder --lora_adapter_dir path/to/your/lora_adapter_folder --max_seq_length 32768 --torch_dtype auto --save_quant_model_dir /path/to/your/lora_mergerd_quant_model_folder --quantization_method q4_k_m

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DeepSeek R1模型Lora微调

down load model weights and dataset

training env config

vllm inference env config

sglang inference env config

visulize training history

supervised finetune

inference

start vllm server

start sglang server

start llama.cpp server

start web-ui

merge lora

merge lora, quantize and deploy

clone and compile llama.cpp

CPU Build

CUDA Build

Move the compiled bin file

merge base model with lora adpater

merge base model with lora adpater and quant to gguf format

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
config		config
imgs		imgs
llama.cpp		llama.cpp
tmp0han7udv		tmp0han7udv
unsloth_compiled_cache		unsloth_compiled_cache
.gitignore		.gitignore
README.md		README.md
app.py		app.py
app_demo.py		app_demo.py
chat.py		chat.py
constants.py		constants.py
hf_download.py		hf_download.py
llama-server		llama-server
merge_lora.py		merge_lora.py
merge_lora_quant_to_gguf.py		merge_lora_quant_to_gguf.py
prompts.py		prompts.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

HTHloveYDH/deepseek_sft

Folders and files

Latest commit

History

Repository files navigation

DeepSeek R1模型Lora微调

down load model weights and dataset

training env config

vllm inference env config

sglang inference env config

visulize training history

supervised finetune

inference

start vllm server

start sglang server

start llama.cpp server

start web-ui

merge lora

merge lora, quantize and deploy

clone and compile llama.cpp

CPU Build

CUDA Build

Move the compiled bin file

merge base model with lora adpater

merge base model with lora adpater and quant to gguf format

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages