GUIDE/USE CASE - UPDATED 2025.04.27: Finetuning (LoRA) with LLaMaFactory 0.9.2 in Windows, using the CPU only #7733

SINAPSA-IC · 2025-04-15T09:50:56Z

SINAPSA-IC
Apr 15, 2025

Because I REALLY WANTED TO USE LLaMa Factory

and because there is NO guide whatsoever on launching this program
-- which btw is VERY IMPORTANT in the AI ecosystem
--- for A LOT of people

I, the mighty CEO of SINAPSA Infocomplex (R)(TM), have created this

GUIDE for fine-tuning a model using LLaMa-Factory 0.9.2 in Windows (10)
A step-by-step guide on installing and launching the web interface.
Author of this guide: SINAPSA Infocomplex(R)(TM)
Date of writing: 2025.04.17

this Guide is on Medium, too:
https://medium.com/p/96b2fc6a80b0

The operations in the following must be carried out in this order specifically (for instance, the creation of a custom-named datasets folder that would hold your custom datasets must come before starting the fine-tuning), except -maybe- the installation of llama.cpp which can be performed before installing LLaMa-Factory.

The “steps” are numbered from 1. to 21.

LLaMa-Factory 0.9.2
on
Windows 10
CUDA version: not necessary to accomplish the procedures in this guide

download and install:
Python 3.12.9

download from: https://www.python.org/downloads/

Command line:

if you want to clean up the Python cache because something might be in there from previous sessions;
BUT this is best done at the end of everything

pip cache purge

Command line:
create a folder (say, “lafa”) on a drive (say, drive D: ) where LLaMaFactory will be placed:

D:
mkdir lafa

(install and) start Git:
in Git Bash:
run these commands, in order:

cd D:
cd lafa
git clone — depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e “.[torch,metrics]”

Command line:
prepare the folder for your own datasets for fine-tuning your model:

D:
mkdir lafa\datasets_mymodels
then, place in it your (JSON) files with your own datasets
BUT
it MUST contain the files:
identity.json
dataset_info.json
c4_demo.jsonl

which are initially in the “data” folder in the “LLaMa-Factory” folder which was just created in step 4^ by way of Git.
for the name of ^that drive-and-folder of datasets to Autocomplete on the interface (webui) in the “Data dir” field (where the list with available datasets files will be), write its name in the DEFAULT_DATA_DIR value in the file:
LLaMaFactory\src\llamafactory\webui\common.py
likewise, for the drive-and-folder where your finetuned models will be saved — the value of DEFAULT_SAVE_DIR:

DEFAULT_CACHE_DIR = “cache”
DEFAULT_CONFIG_DIR = “config”

the folder names here use forward-slash /

DEFAULT_DATA_DIR = “D:/lafa/lafa_datasets”
DEFAULT_SAVE_DIR = “D:/lafa/lafa_llms_created”

USER_CONFIG = “user_config.yaml”

Note:
A simple Guide on creating a dataset that will be used by a model to self-identify itself as a person is on Medium:
https://medium.com/@contact_30070/creating-a-synthetic-dataset-for-self-identification-of-your-own-fine-tuned-llm-29ec4ccae0b0

Command line:
the file requirements.txt MUST be in the folder LLaMa-Factory

so, chdir (cd) to that folder (LLaMa-Factory)
then:

py -m pip install -r requirements.txt
or
pip install -r requirements.txt

OPTIONAL:
for gradio, ONLY if necessary (a message would appear when trying to launch webui.py):
Command line:

pip install pydantic<2.11.0

Command line:
We will be using the CPU only:
so we have to install torch in a version that supports the CPU,

first: uninstall the presumably more recent version of torch (in 2025.04, it was 2.6.0):

pip uninstall -y torch torchvision torchaudio

then, install a version of torch that comes with functionality for CPU:

pip install torch==2.2.2+cpu torchvision==0.17.2 torchaudio==2.2.2 — index-url https://download.pytorch.org/whl/cpu

OPTIONAL: Command line:
for info on CUDA, ONLY if you want to find out about it:

nvcc — version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright © 2005–2023 NVIDIA Corporation
Built on Tue_Jun_13_19:42:34_Pacific_Daylight_Time_2023
Cuda compilation tools, release 12.2, V12.2.91
Build cuda_12.2.r12.2/compiler.32965470_0

Continuing the procedures for using the CPU only:
change a file

go to folder: LLaMa-Factory\src
edit the file: train.py
-insert:

##########
device = torch.device(‘cpu’)
##########

like below, BEFORE def main():

[……]
from llamafactory.train.tuner import run_exp

##########
device = torch.device(‘cpu’)
##########

def main():
run_exp()

def _mp_fn(index):

For xla_spawn (TPUs)

run_exp()

if name == “main”:
main()

then, Save this file (“train.py”) where it is

change a file

go to folder:
…\LLaMA-Factory\src\llamafactory\webui
edit the file:
runner.py
search for “Popen”
you will find it in the following sequence:

NOTE: DO NOT USE shell=True to avoid security risk

self.trainer = Popen([“llamafactory-cli”, “train”, save_cmd(args)], env=env)
yield from self.monitor()

change the line
self.trainer = Popen([“llamafactory-cli”, “train”, save_cmd(args)], env=env)
to:
self.trainer = Popen([“llamafactory-cli”, “train”, save_cmd(args)], env=env, shell=True)
then, Save this file (“runner.py”) where it is

OPTIONAL: for checkpoints — — it may be necessary to change a file:

either in folder:
[user name]\AppData\Roaming\Python\Python312\site-packages\torch\utils
or in the folder where you have installed Python, followed by “Lib\site-packages\torch\utils” like above^
edit the file:
checkpoint.py
search for this command: use_reentrant: Optional[bool] = None,
and change it to:
use_reentrant=False,
then, Save this file (“checkpoint.py”) where it is

Command Line:

chdir to the drive-and-folder created earlier (by way of Git), “LLaMa-Factory”
there, chdir to the folder “src”
launch the web UI of LLaMaFactory:

webui — device cpu
or
webui.py — device cpu

_ NOW, IF right after you clicked Start, you see a message similar to:
“Warning: CUDA environment was not detected”

and NO OTHER message (above it, like “Failed”)
-> disregard it and go back to the Command Prompt where you have just written the command above,
have a little patience and most probably you will see the finetuning starting and its progression ->
-> which you can also see in the lower part of the webui page_
HOWEVER, the training may NOT start, with an ERROR saying something about “llamafactory-cli”; this is something that someone more in-the-know must address (explain & tell how to solve)
— if this is the unfortunate case, then you delete the folder where LLaMaFactory is, then pip cache purge, then uninstall Python and start anew. tough luck! (I for one don’t know what could’ve happened, that is)

The exported model will be created as .safetensors file(s)
by default saved in the folder:
LLaMA-Factory\src\saves\Custom\lora
so you may want to create a folder specifically for placing the finetuned versions of models in it
for instance, D:\llms_mymodels

— — -
Windows (10):
the BASE MODEL, downloaded from HuggingFace before training, will be downloaded in:
user \ .cache\huggingface\hub
— — -

for Converting to GGUF your finetuned version of a model which is now .safetensors:

We will use llama.cpp, instead of the Export section of LLaMa-Factory.

Command line:

pip install optimum>=1.17.0
and:
pip install auto_gptq>=0.5.0
this^ DOES NOT identify the version of torch+CPU that you have installed earlier (torch 2.2.2+cpu)
message:
… Building cuda extension requires PyTorch (>=1.13.0) being installed, please install PyTorch first: No module named ‘torch’
=> so you have to use llama.cpp for converting from safetensors to GGUF:

from the folder with the BASE MODEL (it may be “user \ .cache\huggingface\hub\models-ModelName\snapshots\567…blahblah562465357….”) copy the file: config.json
Paste/place that^ file in the folder where the .safetensors of your finetuned version of a model is

install llama.cpp to be able to convert your model to GGUF

create a folder where llama.cpp will be placed, say D:\llamacpp
(install and) start Git Bash:
chdir (cd) to that folder where llama.cpp will be placed/installed, say D:\llamacpp
then:
cd D:
cd llamacpp
git clone https://github.com/ggerganov/llama.cpp
pip install -r requirements.txt

Create your own LLM: start with the .safetensors file(s)

copy and edit a file
Open Windows Explorer/My Computer or similar
go into folder:
D:\lafa\LLaMA-Factory\examples\train_lora
or similar
if you trained your model based on a QWen model:
— Copy the file: qwen2vl_lora.sft.yaml
to:
the folder where your model has been created, for instance in folder:
D:\lafa\lafa_llms_created\train_2025–04–16–07–49–01

— in there, Edit this file:
— — change the entry:
model_name_or_path
in it you will write the path to the BASE model that has been downloaded from HuggingFace before training, for instance here it was QWen 1.8
for instance:
C:/[your user name]/.cache/huggingface/hub/models — Qwen — Qwen-1_8B/snapshots/fa6e214…blahblah…6bfdf5eed

— — change the entry:
adapter_name_or_path
in it you will write the path to your .safetensors model
for instance
D:\lafa\lafa_llms_created\train_2025–04–16–07–49–01\checkpoint-15

— change the entry:
export_dir
in it you will write the path to the folder where the EXPORTED model will be
for instance
D:/lafa/lafa_llms_exported

so, AFTER you edit those^ two entries, this .yaml file will be:

Note: DO NOT use quantized model or quantization_bit when merging lora adapters

model

model_name_or_path: C:/[your user name]/.cache/huggingface/hub/models — Qwen — Qwen-1_8B/snapshots/fa6e214ccbbc6a55235c26ef406355b6bfdf5eed
adapter_name_or_path: D:\lafa\lafa_llms_created\train_2025–04–16–07–49–01\checkpoint-15
template: qwen
trust_remote_code: true

export

export_dir: D:/lafa/lafa_llms_exported
export_size: 5
export_device: cpu
export_legacy_format: false

launch the EXPORT command:
Command line
D:\lafa\LLaMA-Factory\src>
run the command:
llamafactory-cli export D:\lafa\lafa_llms_created\train_2025–04–16–07–49–01\qwen2vl_lora_sft.yaml
Look at it go

[INFO|tokenization_utils_base.py:2058] 2025–04–18 13:23:28,053 >> loading file qwen.tiktoken
[INFO|tokenization_utils_base.py:2058] 2025–04–18 13:23:28,053 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2058] 2025–04–18 13:23:28,053 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2058] 2025–04–18 13:23:28,053 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2058] 2025–04–18 13:23:28,053 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2058] 2025–04–18 13:23:28,053 >> loading file chat_template.jinja
[INFO|configuration_utils.py:691] 2025–04–18 13:23:28,286 >> loading configuration file
…
…
…
[INFO|2025–04–18 13:25:01] llamafactory.model.model_utils.attention:143 >> Using vanilla attention implementation.
[INFO|2025–04–18 13:25:04] llamafactory.model.adapter:143 >> Merged 1 adapter(s).
[INFO|2025–04–18 13:25:04] llamafactory.model.adapter:143 >> Loaded adapter(s): D:\lafa\lafa_llms_created\train_2025–04–16–07–49–01\checkpoint-15
[INFO|2025–04–18 13:25:04] llamafactory.model.loader:143 >> all params: 1,836,828,672
[INFO|2025–04–18 13:25:04] llamafactory.train.tuner:143 >> Convert model dtype to: torch.float32.
[INFO|configuration_utils.py:419] 2025–04–18 13:25:04,634 >> Configuration saved in D:/lafa/lafa_llms_exported\config.json
[INFO|configuration_utils.py:911] 2025–04–18 13:25:04,640 >> Configuration saved in D:/lafa/lafa_llms_exported\generation_config.json
[INFO|modeling_utils.py:3580] 2025–04–18 13:29:45,642 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at D:/lafa/lafa_llms_exported\model.safetensors.index.json.
[INFO|tokenization_utils_base.py:2510] 2025–04–18 13:29:49,723 >> tokenizer config file saved in D:/lafa/lafa_llms_exported\tokenizer_config.json
[INFO|tokenization_utils_base.py:2519] 2025–04–18 13:29:49,927 >> Special tokens file saved in D:/lafa/lafa_llms_exported\special_tokens_map.json
[INFO|2025–04–18 13:29:53] llamafactory.train.tuner:143 >> Ollama modelfile saved in D:/lafa/lafa_llms_exported\Modelfile

Converting the SAFETENSORS to GGUF

chdir to the folder of llama.cpp, just to be sure that the script file will be found: convert_hf_to_gguf.py
run this script: convert_hf_to_gguf.py

Command line:
D:\llamacpp\llama.cpp>convert_hf_to_gguf.py D:\lafa\lafa_llms_exported — outfile D:\lafa\lafa_llms_ggufs\qwenft.gguf — outtype q8_0

the results will be similar to the following:

INFO:hf-to-gguf:Loading model: lafa_llms_exported
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model…
…
…
…
INFO:hf-to-gguf:Set model quantization version
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:D:\lafa\lafa_llms_ggufs\qwenft.gguf: n_tensors = 290, total_size = 493.4M
Writing: 100%|██████████████████████████████████████████████████████████████████████| 493M/493M [00:07<00:00, 62.5Mbyte/s]
INFO:hf-to-gguf:Model successfully exported to D:\lafa\lafa_llms_ggufs\qwenft.gguf

ATTENTION:

In this step, only the METADATA of the model was saved,
but you want to OBTAIN A USABLE GGUF MODEL = you must also save the NEW WEIGHTS combined with the BASE MODEL.

The .safetensors here is the so-called ADAPTER = the portion of the NEW model created by ^finetuning by LoRA, that WILL OVERLAP/ADD TO the BASE model,
but we want to MERGE IT/ADD TO with the BASE MODEL

NOW, the .gguf model created in this step WILL NOT LOAD in LM Studio or a similar program,
so we can CHECK this by trying to load our model in LM Studio: an ERROR message will be shown

^ATTENTION

Trying to load your GGUF model into LM Studio for testing it —
by chatting with it on (a) subject(s) that the base model did NOT know before your fine-tuning it into the GGUF

ATTENTION:

Here we ONLY CHECK if our model will load in LM Studio: it WILL NOT.

^ ATTENTION

go to the folder for the LLMs visible in LM Studio (http://lmstudio.ai/)
for instance
D:\llm_for_lmstudio\lmstudio_models

in there, create a folder named like you — say, “jenny”
- then, go inside that folder
- in there, create a folder named like your GGUF model — say, “qwenft”
- the complete path to your GGUF will be:
  D:\llm_for_lmstudio\lmstudio_models\jenny\qwenft
- copy your GGUF file with your model into this folder
- open LM Studio
- your GGUF model should be listed in “My Models” and in other lists with the available LLMs for use with LM Studio
  this^ model will NOT be loaded, because it DOES NOT CONTAIN the BASE MODEL

So, we continue:

CREATING AN USABLE GGUF
Merging the .safetensors of the ADAPTER model (the portion of our NEW model that WILL OVERLAP/ADD TO the BASE model) with the results of finetuning by LoRA:

We must perform THIS step to COMBINE the NEW WEIGHTS (the ADAPTER = our model) with the BASE MODEL

In this step, we END the entire process — we create the GGUF that can be loaded in LM Studio or a similar program

use the method: merge_and_unload
either create the following Python script and run it,
or create a Visual Studio Python project containing the following instructions — or do a similar thing if you do not have Visual Studio — and RUN it (F5 in Visual Studio)
— the paths (folder and file names) are for exemplification here:
— — the instructions (the script) that you will have to adjust for your .safetensors model, for the base model and the respective paths:

from transformers import AutoModelForCausalLM
from peft import PeftModel
model = AutoModelForCausalLM.from_pretrained(here is the path-folder to the BASE MODEL)
model = PeftModel.from_pretrained(here is the path-folder to your FINE-TUNED model, with the .safetensors)
model = model.merge_and_unload()
model.save_pretrained(“D/lafa/mymodels_finetuned_for_ggufs”)

NOW:

go to the folder where your model was created first time around - as .safetensors, say, the folder with a checkpoint in it:
D:\lafa\lafa_llms_created\train_2025-04-25-11-35-14\checkpoint-18
from there, Copy the files:
-- tokenizer.json
-- tokenizer_config.json
-> in the folder where the^ model^ created^ combined^ merged^ above^ as LAST .safetensors files are, say, the folder:
D:\lafa\lafa_llms_exported\qwenft
where would be the files:
model-00001-of-00002.safetensors (depending on how many .safetensors pieces are in your merged^model^)
model-00002-of-00002.safetensors, config.json, generation_config
generation_config,json
model.safetensors.index.json
go back to step 19 and perform it on this^ folder where the .safetensor(s) of YOUR fine-tuned model is, and on the folder where you will put the endpoint GGUFs, say, the command would be:
D:\llamacpp\llama.cpp> convert_hf_to_gguf.py D:\lafa\lafa_llms_exported\qwenft --outfile D:\lafa\lafa_llms_ggufs\qwenft.gguf --outtype q8_0

IN THE END, try and load this final GGUF model in LM Studio or a similar program.

Good luck!

Thank you for reading this Guide and for being interested in creating your own, homemade, LLMs.
Corrections are WELCOME! This Guide was put together from n pieces of info from the web, because no such coherent Guide existed. We thank all who devoted their time to solve arcane problems during the feat of creating their own LLMs.

Note: Inconsistencies and lack of functionality/results could arise in the course of this whole process, either because of some of the Python packages, or — LLaMa-Factory repository, or — llama.cpp repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GUIDE/USE CASE - UPDATED 2025.04.27: Finetuning (LoRA) with LLaMaFactory 0.9.2 in Windows, using the CPU only #7733

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

GUIDE/USE CASE - UPDATED 2025.04.27: Finetuning (LoRA) with LLaMaFactory 0.9.2 in Windows, using the CPU only #7733

SINAPSA-IC Apr 15, 2025

the folder names here use forward-slash /

For xla_spawn (TPUs)

NOTE: DO NOT USE shell=True to avoid security risk

Note: DO NOT use quantized model or quantization_bit when merging lora adapters

model

export

Replies: 0 comments

SINAPSA-IC
Apr 15, 2025