GUIDE/USE CASE - UPDATED 2025.04.27: Finetuning (LoRA) with LLaMaFactory 0.9.2 in Windows, using the CPU only #7733
SINAPSA-IC
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Because I REALLY WANTED TO USE LLaMa Factory
-- which btw is VERY IMPORTANT in the AI ecosystem
--- for A LOT of people
I, the mighty CEO of SINAPSA Infocomplex (R)(TM), have created this
GUIDE for fine-tuning a model using LLaMa-Factory 0.9.2 in Windows (10)
A step-by-step guide on installing and launching the web interface.
Author of this guide: SINAPSA Infocomplex(R)(TM)
Date of writing: 2025.04.17
this Guide is on Medium, too:
https://medium.com/p/96b2fc6a80b0
The operations in the following must be carried out in this order specifically (for instance, the creation of a custom-named datasets folder that would hold your custom datasets must come before starting the fine-tuning), except -maybe- the installation of llama.cpp which can be performed before installing LLaMa-Factory.
The “steps” are numbered from 1. to 21.
LLaMa-Factory 0.9.2
on
Windows 10
CUDA version: not necessary to accomplish the procedures in this guide
Python 3.12.9
create a folder (say, “lafa”) on a drive (say, drive D: ) where LLaMaFactory will be placed:
in Git Bash:
run these commands, in order:
cd D:
cd lafa
git clone — depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e “.[torch,metrics]”
prepare the folder for your own datasets for fine-tuning your model:
which are initially in the “data” folder in the “LLaMa-Factory” folder which was just created in step 4^ by way of Git.
for the name of ^that drive-and-folder of datasets to Autocomplete on the interface (webui) in the “Data dir” field (where the list with available datasets files will be), write its name in the DEFAULT_DATA_DIR value in the file:
LLaMaFactory\src\llamafactory\webui\common.py
likewise, for the drive-and-folder where your finetuned models will be saved — the value of DEFAULT_SAVE_DIR:
DEFAULT_CACHE_DIR = “cache”
DEFAULT_CONFIG_DIR = “config”
the folder names here use forward-slash /
DEFAULT_DATA_DIR = “D:/lafa/lafa_datasets”
DEFAULT_SAVE_DIR = “D:/lafa/lafa_llms_created”
USER_CONFIG = “user_config.yaml”
Note:
A simple Guide on creating a dataset that will be used by a model to self-identify itself as a person is on Medium:
https://medium.com/@contact_30070/creating-a-synthetic-dataset-for-self-identification-of-your-own-fine-tuned-llm-29ec4ccae0b0
the file requirements.txt MUST be in the folder LLaMa-Factory
for gradio, ONLY if necessary (a message would appear when trying to launch webui.py):
Command line:
We will be using the CPU only:
so we have to install torch in a version that supports the CPU,
for info on CUDA, ONLY if you want to find out about it:
change a file
-insert:
##########
device = torch.device(‘cpu’)
##########
like below, BEFORE def main():
[……]
from llamafactory.train.tuner import run_exp
##########
device = torch.device(‘cpu’)
##########
def main():
run_exp()
def _mp_fn(index):
For xla_spawn (TPUs)
run_exp()
if name == “main”:
main()
…\LLaMA-Factory\src\llamafactory\webui
runner.py
NOTE: DO NOT USE shell=True to avoid security risk
self.trainer = Popen([“llamafactory-cli”, “train”, save_cmd(args)], env=env)
yield from self.monitor()
change the line
self.trainer = Popen([“llamafactory-cli”, “train”, save_cmd(args)], env=env)
to:
self.trainer = Popen([“llamafactory-cli”, “train”, save_cmd(args)], env=env, shell=True)
then, Save this file (“runner.py”) where it is
[user name]\AppData\Roaming\Python\Python312\site-packages\torch\utils
checkpoint.py
and change it to:
use_reentrant=False,
_ NOW, IF right after you clicked Start, you see a message similar to:
“Warning: CUDA environment was not detected”
and NO OTHER message (above it, like “Failed”)
-> disregard it and go back to the Command Prompt where you have just written the command above,
have a little patience and most probably you will see the finetuning starting and its progression ->
-> which you can also see in the lower part of the webui page_
HOWEVER, the training may NOT start, with an ERROR saying something about “llamafactory-cli”; this is something that someone more in-the-know must address (explain & tell how to solve)
— if this is the unfortunate case, then you delete the folder where LLaMaFactory is, then pip cache purge, then uninstall Python and start anew. tough luck! (I for one don’t know what could’ve happened, that is)
by default saved in the folder:
LLaMA-Factory\src\saves\Custom\lora
so you may want to create a folder specifically for placing the finetuned versions of models in it
for instance, D:\llms_mymodels
— — -
Windows (10):
the BASE MODEL, downloaded from HuggingFace before training, will be downloaded in:
user \ .cache\huggingface\hub
— — -
We will use llama.cpp, instead of the Export section of LLaMa-Factory.
cd D:
cd llamacpp
git clone https://github.com/ggerganov/llama.cpp
pip install -r requirements.txt
copy and edit a file
Open Windows Explorer/My Computer or similar
go into folder:
D:\lafa\LLaMA-Factory\examples\train_lora
or similar
if you trained your model based on a QWen model:
— Copy the file: qwen2vl_lora.sft.yaml
to:
the folder where your model has been created, for instance in folder:
D:\lafa\lafa_llms_created\train_2025–04–16–07–49–01
— in there, Edit this file:
— — change the entry:
model_name_or_path
in it you will write the path to the BASE model that has been downloaded from HuggingFace before training, for instance here it was QWen 1.8
for instance:
C:/[your user name]/.cache/huggingface/hub/models — Qwen — Qwen-1_8B/snapshots/fa6e214…blahblah…6bfdf5eed
— — change the entry:
adapter_name_or_path
in it you will write the path to your .safetensors model
for instance
D:\lafa\lafa_llms_created\train_2025–04–16–07–49–01\checkpoint-15
— change the entry:
export_dir
in it you will write the path to the folder where the EXPORTED model will be
for instance
D:/lafa/lafa_llms_exported
so, AFTER you edit those^ two entries, this .yaml file will be:
Note: DO NOT use quantized model or quantization_bit when merging lora adapters
model
model_name_or_path: C:/[your user name]/.cache/huggingface/hub/models — Qwen — Qwen-1_8B/snapshots/fa6e214ccbbc6a55235c26ef406355b6bfdf5eed
adapter_name_or_path: D:\lafa\lafa_llms_created\train_2025–04–16–07–49–01\checkpoint-15
template: qwen
trust_remote_code: true
export
export_dir: D:/lafa/lafa_llms_exported
export_size: 5
export_device: cpu
export_legacy_format: false
launch the EXPORT command:
Command line
D:\lafa\LLaMA-Factory\src>
run the command:
llamafactory-cli export D:\lafa\lafa_llms_created\train_2025–04–16–07–49–01\qwen2vl_lora_sft.yaml
Look at it go
[INFO|tokenization_utils_base.py:2058] 2025–04–18 13:23:28,053 >> loading file qwen.tiktoken
[INFO|tokenization_utils_base.py:2058] 2025–04–18 13:23:28,053 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2058] 2025–04–18 13:23:28,053 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2058] 2025–04–18 13:23:28,053 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2058] 2025–04–18 13:23:28,053 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2058] 2025–04–18 13:23:28,053 >> loading file chat_template.jinja
[INFO|configuration_utils.py:691] 2025–04–18 13:23:28,286 >> loading configuration file
…
…
…
[INFO|2025–04–18 13:25:01] llamafactory.model.model_utils.attention:143 >> Using vanilla attention implementation.
[INFO|2025–04–18 13:25:04] llamafactory.model.adapter:143 >> Merged 1 adapter(s).
[INFO|2025–04–18 13:25:04] llamafactory.model.adapter:143 >> Loaded adapter(s): D:\lafa\lafa_llms_created\train_2025–04–16–07–49–01\checkpoint-15
[INFO|2025–04–18 13:25:04] llamafactory.model.loader:143 >> all params: 1,836,828,672
[INFO|2025–04–18 13:25:04] llamafactory.train.tuner:143 >> Convert model dtype to: torch.float32.
[INFO|configuration_utils.py:419] 2025–04–18 13:25:04,634 >> Configuration saved in D:/lafa/lafa_llms_exported\config.json
[INFO|configuration_utils.py:911] 2025–04–18 13:25:04,640 >> Configuration saved in D:/lafa/lafa_llms_exported\generation_config.json
[INFO|modeling_utils.py:3580] 2025–04–18 13:29:45,642 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at D:/lafa/lafa_llms_exported\model.safetensors.index.json.
[INFO|tokenization_utils_base.py:2510] 2025–04–18 13:29:49,723 >> tokenizer config file saved in D:/lafa/lafa_llms_exported\tokenizer_config.json
[INFO|tokenization_utils_base.py:2519] 2025–04–18 13:29:49,927 >> Special tokens file saved in D:/lafa/lafa_llms_exported\special_tokens_map.json
[INFO|2025–04–18 13:29:53] llamafactory.train.tuner:143 >> Ollama modelfile saved in D:/lafa/lafa_llms_exported\Modelfile
Command line:
D:\llamacpp\llama.cpp>convert_hf_to_gguf.py D:\lafa\lafa_llms_exported — outfile D:\lafa\lafa_llms_ggufs\qwenft.gguf — outtype q8_0
the results will be similar to the following:
INFO:hf-to-gguf:Loading model: lafa_llms_exported
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model…
…
…
…
INFO:hf-to-gguf:Set model quantization version
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:D:\lafa\lafa_llms_ggufs\qwenft.gguf: n_tensors = 290, total_size = 493.4M
Writing: 100%|██████████████████████████████████████████████████████████████████████| 493M/493M [00:07<00:00, 62.5Mbyte/s]
INFO:hf-to-gguf:Model successfully exported to D:\lafa\lafa_llms_ggufs\qwenft.gguf
ATTENTION:
In this step, only the METADATA of the model was saved,
but you want to OBTAIN A USABLE GGUF MODEL = you must also save the NEW WEIGHTS combined with the BASE MODEL.
The .safetensors here is the so-called ADAPTER = the portion of the NEW model created by ^finetuning by LoRA, that WILL OVERLAP/ADD TO the BASE model,
but we want to MERGE IT/ADD TO with the BASE MODEL
NOW, the .gguf model created in this step WILL NOT LOAD in LM Studio or a similar program,
so we can CHECK this by trying to load our model in LM Studio: an ERROR message will be shown
^ATTENTION
by chatting with it on (a) subject(s) that the base model did NOT know before your fine-tuning it into the GGUF
ATTENTION:
Here we ONLY CHECK if our model will load in LM Studio: it WILL NOT.
^ ATTENTION
go to the folder for the LLMs visible in LM Studio (http://lmstudio.ai/)
for instance
D:\llm_for_lmstudio\lmstudio_models
in there, create a folder named like you — say, “jenny”
D:\llm_for_lmstudio\lmstudio_models\jenny\qwenft
this^ model will NOT be loaded, because it DOES NOT CONTAIN the BASE MODEL
So, we continue:
Merging the .safetensors of the ADAPTER model (the portion of our NEW model that WILL OVERLAP/ADD TO the BASE model) with the results of finetuning by LoRA:
We must perform THIS step to COMBINE the NEW WEIGHTS (the ADAPTER = our model) with the BASE MODEL
In this step, we END the entire process — we create the GGUF that can be loaded in LM Studio or a similar program
— the paths (folder and file names) are for exemplification here:
— — the instructions (the script) that you will have to adjust for your .safetensors model, for the base model and the respective paths:
from transformers import AutoModelForCausalLM
from peft import PeftModel
model = AutoModelForCausalLM.from_pretrained(here is the path-folder to the BASE MODEL)
model = PeftModel.from_pretrained(here is the path-folder to your FINE-TUNED model, with the .safetensors)
model = model.merge_and_unload()
model.save_pretrained(“D/lafa/mymodels_finetuned_for_ggufs”)
NOW:
go to the folder where your model was created first time around - as .safetensors, say, the folder with a checkpoint in it:
D:\lafa\lafa_llms_created\train_2025-04-25-11-35-14\checkpoint-18
from there, Copy the files:
-- tokenizer.json
-- tokenizer_config.json
-> in the folder where the^ model^ created^ combined^ merged^ above^ as LAST .safetensors files are, say, the folder:
D:\lafa\lafa_llms_exported\qwenft
where would be the files:
model-00001-of-00002.safetensors (depending on how many .safetensors pieces are in your merged^model^)
model-00002-of-00002.safetensors, config.json, generation_config
generation_config,json
model.safetensors.index.json
go back to step 19 and perform it on this^ folder where the .safetensor(s) of YOUR fine-tuned model is, and on the folder where you will put the endpoint GGUFs, say, the command would be:
D:\llamacpp\llama.cpp> convert_hf_to_gguf.py D:\lafa\lafa_llms_exported\qwenft --outfile D:\lafa\lafa_llms_ggufs\qwenft.gguf --outtype q8_0
IN THE END, try and load this final GGUF model in LM Studio or a similar program.
Good luck!
Thank you for reading this Guide and for being interested in creating your own, homemade, LLMs.
Corrections are WELCOME! This Guide was put together from n pieces of info from the web, because no such coherent Guide existed. We thank all who devoted their time to solve arcane problems during the feat of creating their own LLMs.
Note: Inconsistencies and lack of functionality/results could arise in the course of this whole process, either because of some of the Python packages, or — LLaMa-Factory repository, or — llama.cpp repository.
Beta Was this translation helpful? Give feedback.
All reactions