This tool compresses SDXL LoRA models by selectively reducing their rank (the number of internal dimensions). It goes beyond simple truncation by using methods to score the importance of each dimension, comparing the LoRA's characteristics against the original base model checkpoint. This context-aware approach helps preserve the LoRA's essential effects while significantly reducing file size.
This tool is primarily designed for SDXL LoRA/LoCon models. Support for other types (SD1.5, DoRA, LoKr, LoHa, etc.) is not currently included.
- Features
- Installation
- Usage
- Understanding Recipes
- Output Filename Convention
- Understanding Verbose Output (
-vv
) - Technical Details: Spectral vs. Frobenius Norm
- Why is it so fast?
- Evaluation
- License
- Contributing
- SDXL Focused: Optimized for SDXL LoRA and LoCon models.
- Base Model Aware: Uses weights from the base model checkpoint for more robust dimension scoring.
- Flexible Scoring: Combine multiple metrics (comparing LoRA to itself, to the base model, considering parameter efficiency) using weighted recipes.
- Target Size or Threshold: Prune dimensions based on a specific target file size (
size=
) or a direct importance score threshold (thr=
). - Batch Processing: Process multiple LoRA files in a single command.
- Multiple Recipes: Apply several different compression recipes to the same LoRA simultaneously, facilitating experimentation without redundant SVD calculations.
- Optimized for Speed: Uses efficient SVD techniques and caches base model norm calculations to disk (
norms_cache.json
) for faster subsequent runs.
Clone the repository:
git clone https://github.com/elias-gaeros/resize_lora.git
cd resize_lora
- Python 3.8+
torch
safetensors
tqdm
Install required packages (preferably in a virtual environment):
pip install torch safetensors tqdm
python resize_lora.py /path/to/sdxl_base_v1.0.safetensors /path/to/my_lora.safetensors -o /path/to/output/folder
This command uses the default recipe (fro_ckpt=1,thr=-3.5
). It compresses the LoRA by keeping only the dimensions whose "strength" (singular value) is greater than roughly 1/16th (
checkpoint_path
(positional): Path to the base model checkpoint file (e.g., SDXL 1.0 base).lora_model_paths
(positional): Path(s) to the LoRA model files.- You can specify multiple files:
lora1.safetensors lora2.safetensors
. - You can use wildcards (shell expanded):
loras/*.safetensors
. - You can specify a weighted merge:
"lora1:0.7,lora2:0.3"
(use quotes if needed by your shell). The tool will merge these LoRAs before resizing.
- You can specify multiple files:
-o
,--output_folder
(required): Folder where the resized LoRA files will be saved.-t
,--output_dtype
: Output precision (16
for float16,32
for float32). Default:16
.-d
,--device
: Device for computations (cuda
,cpu
, etc.). Default:cuda
if available, otherwisecpu
.-r
,--score_recipes
: Defines how to score and prune dimensions. Separate multiple recipes with colons (:
). Default:fro_ckpt=1,thr=-3.5
. See Understanding Recipes below.-v
,--verbose
: Increase output detail.-v
: INFO level (shows progress, final thresholds, quantiles).-vv
: DEBUG level (shows detailed per-layer resizing info).
-
Process multiple LoRAs with the default recipe:
python resize_lora.py sdxl_base.safetensors loras/*.safetensors -o resized_loras
-
Apply multiple recipes to a single LoRA:
python resize_lora.py sdxl_base.safetensors my_lora.safetensors -o experimental_resizes \ -r "fro_ckpt=1,thr=-3.5:spn_lora=1,thr=-0.7:spn_ckpt=1,size=32"
This generates three versions of
my_lora.safetensors
:- One using the default base-model comparison (
spn_ckpt
) with a threshold of$10^{-1.2}$ . - One using a self-comparison (
spn_lora
) similar to Kohya'ssv_ratio
, keeping dimensions stronger than$10^{-0.7} \approx 1/5$ th of the LoRA layer's own maximum. - One aiming for a 32 MiB file size, using the
spn_ckpt
scoring method to decide which dimensions offer the best "value" per parameter.
- One using the default base-model comparison (
-
Merge two LoRAs then resize the result using a custom recipe and verbose output:
python resize_lora.py sdxl_base.safetensors "style_lora:0.6,char_lora:0.4" -o merged_resized -vv \ -r "fro_ckpt=0.8,params=0.2,size=50"
This merges
style_lora
(60% weight) andchar_lora
(40% weight), then resizes the merged result to approximately 50 MiB. The dimension scoring prioritizes comparison to the base model's Frobenius norm (fro_ckpt
, 80% importance) while slightly favoring parameter efficiency (params
, 20% importance).-vv
shows detailed logs.
Recipes tell the script how to decide which internal dimensions of the LoRA are important enough to keep. Each recipe consists of two parts: a Selection Method (size
or thr
) and one or more Scoring Methods (like spn_ckpt
, params
, etc.).
Internally, a LoRA layer modifies the output of a standard neural network layer. This modification can be broken down (using Singular Value Decomposition, SVD) into a set of independent "directions" or "dimensions", each with an associated "strength" (its singular value,
The core idea here is to calculate an importance score for each dimension. This score starts with the dimension's raw strength (
You must specify exactly one of these per recipe:
-
thr=<value>
: Threshold Selection- This sets a direct cutoff on the final log10 importance score. Any dimension whose score is greater than
<value>
is kept. - Think of the threshold in terms of fractions:
-
thr=-1.0
: Keep dimensions with strength > 1/10th of the reference magnitude(s). -
thr=-1.2
: Keep dimensions >$10^{-1.2} \approx 1/16$ th of the reference. -
thr=-2.0
: Keep dimensions >$10^{-2.0} \approx 1/100$ th of the reference.
-
- Negative thresholds are typical when comparing to norms (like
spn_ckpt
orfro_ckpt
).
- This sets a direct cutoff on the final log10 importance score. Any dimension whose score is greater than
-
size=<value>
: Target Size Selection- This sets a target output file size in MiB.
- The script doesn't magically know the final size beforehand. Instead, it calculates the importance score for all dimensions (using the specified scoring methods).
- It then figures out the parameter cost (bytes needed) for each dimension.
- It performs a greedy selection: Keep the dimensions with the highest score-per-byte until the total size budget (
<value>
MiB) is met. - This process effectively calculates an internal threshold (
thr
) that achieves the target size. This calculated threshold is reported in the output filename and logs. - This is useful when you have a specific file size budget.
These methods determine how we judge the importance of each LoRA dimension (represented by its singular value,
Think of it like evaluating how significant a small change is. A $1 increase in price is negligible for a car but significant for a candy bar. Similarly, a LoRA dimension's strength might be considered important if it's large compared to the base model's effect (spn_ckpt
), or large compared to the LoRA's own strongest effect (spn_lora
), or efficient in terms of parameters (params
).
You don't have to rely on just one comparison. Recipes allow you to weight the importance of different reference points. For example spn_ckpt=0.7,params=0.3
This means: "When deciding which dimensions to keep, I care 70% about how strong they are compared to the base model layer's peak strength (spn_ckpt
), and 30% about how parameter-efficient they are (params
)."
Weights are specified after the method name (e.g., spn_ckpt=0.7
). If you omit the weight (e.g., just spn_ckpt
), it defaults to 1.0. The tool automatically normalizes all specified weights so they sum to 1.0 (e.g., fro_ckpt=3,spn_lora=1
is treated as fro_ckpt=0.75,spn_lora=0.25
).
The final importance score combines these weighted comparisons. Intuitively, a dimension gets a higher score if it's strong relative to the reference points you've weighted highly. Mathematically, the tool calculates a weighted geometric mean of the ratios
For numerical stability and easier thresholding, calculations are done using logarithms. The final score is essentially log10(sigma_i) - w1*log10(Ref1) - w2*log10(Ref2)...
whereRef
are the reference magnitudes and w
s are the normalized weights you set for each method. As the final score, the thr=
threshold is also specified in log10 space: thr=-1
is a 1/10th cutoff, thr=-2
is a 1/100th cutoff, etc.
Changing scoring methods and their weights will have a strong influence on the optimal threshold. For this reason you should probably target a fixec size=
when comparing different recipes.
Here are the available methods and the reference magnitude they compare
-
spn_ckpt
:-
Reference: Max strength (spectral norm,
$\sigma_\text{max}$ ) of the corresponding base model layer. - Intuition: How strong is the LoRA dimension relative to the base model's peak effect in that layer? Good for general compression, aligns LoRA significance with the base model's scale.
-
Reference: Max strength (spectral norm,
-
spn_lora
:-
Reference: Max strength (spectral norm,
$\sigma_\text{max}$ ) of the LoRA layer itself. -
Intuition: How strong is this dimension relative to the strongest dimension within the same LoRA layer? Keeps dimensions that are internally important to the LoRA's function. (Similar to Kohya's
sv_ratio
).
-
Reference: Max strength (spectral norm,
-
fro_ckpt
:-
Reference: Overall magnitude (Frobenius norm,
$|W|_F$ ) of the corresponding base model layer. -
Intuition: How strong is the LoRA dimension relative to the base model's total effect magnitude in that layer? Similar to
spn_ckpt
but considers the "average" strength across all base layer dimensions. Larger layers have more spread out singular values, sofro_ckpt
penalizes them more thanspn_ckpt
.
-
Reference: Overall magnitude (Frobenius norm,
-
fro_lora
:-
Reference: Overall magnitude (Frobenius norm,
$|BA|_F$ ) of the LoRA layer itself. - Intuition: How strong is this dimension relative to the total effect magnitude of this LoRA layer?
-
Reference: Overall magnitude (Frobenius norm,
-
subspace
(Experimental):-
Reference: How much the base model layer acts along this specific LoRA dimension's direction (
$| \langle \mathbf{u}_i, W_\text{ckpt} \mathbf{v}_i \rangle |$ ). - Intuition: Is the LoRA dimension strong even after accounting for how the base model already operates in that specific direction? Penalizes dimensions where the LoRA effect might be redundant with the base model.
-
Reference: How much the base model layer acts along this specific LoRA dimension's direction (
-
params
:-
Reference: Parameter cost per rank (
$n + m$ for an$n \times m$ layer). -
Intuition: This doesn't compare strength-to-strength, but rather penalizes dimensions slightly if they reside in layers that are inherently parameter-hungry (per rank). Use small weights (e.g.,
0.1
,0.2
) primarily with thesize=
selection method to favor keeping dimensions in more "efficient" layers when constrained by a file size budget. -
Note:
fro_ckpt
already implicitly penalizes layers with lot of parameters per dimension see Spectral vs. Frobenius Norm.
-
Reference: Parameter cost per rank (
-
rescale=<value>
: Multiplies all singular values$\sigma_i$ by<value>
before any scoring calculations. Default:1.0
.- Useful if you want to globally adjust the LoRA's strength during resizing. For example,
rescale=0.8
would resize a slightly weaker version of the LoRA, and thus prune more dimensions. This factor is baked into the final weights.
- Useful if you want to globally adjust the LoRA's strength during resizing. For example,
The script generates informative filenames for the resized LoRAs:
<lora_name>_<recipe_details>_th<threshold>.safetensors
Where:
<lora_name>
: The name of the original LoRA file (or the merged name if applicable).<recipe_details>
: A summary of the recipe used:- Scoring methods and their weights:
spnckpt1
,frckpt0.8
,params0.2
, etc. (Names are shortened, weights included if not 1.0). scale<value>
is added ifrescale
is not 1.0.size<value>
is added if thesize
selection method was used.
- Scoring methods and their weights:
_th<threshold>
: The final log10 threshold used for pruning.- If you specified
thr=X
in the recipe, this will be_thX
. - If you specified
size=Y
, this will be the threshold calculated by the script to meet that size target (e.g.,_th-3.142
).
- If you specified
Example: Running with -r spn_ckpt=1,size=32
might produce my_lora_spnckpt1_size32_th-2.871.safetensors
.
When running with -vv
(DEBUG level), you'll see lines like this for many layers during the "Scoring" phase:
DEBUG:root:dim: 8->5 rle_lora: 3.19% rle_ckpt: 0.03% lora_te1_text_model_encoder_layers_0_self_attn_out_proj
DEBUG:root:dim:256->0 rle_lora:100.00% rle_ckpt: 0.00% lora_unet_middle_block_0_in_layers_2
dim: <old> -> <new>
: Shows the original rank (number of dimensions) of the LoRA layer and the new rank after pruning based on the recipe's threshold.256->0
means the entire LoRA layer was removed.rle_lora: X.XX%
: Relative LoRA Error. This estimates the error introduced by removing dimensions, relative to the original LoRA layer's total magnitude (Frobenius norm). It's calculated asnorm(discarded_singular_values) / norm(all_singular_values)
. A value of100.00%
means all dimensions were discarded (or the layer was zero to begin with). Lower is generally better.rle_ckpt: Y.YY%
: Relative Checkpoint Error. This estimates the error relative to the base model checkpoint layer's magnitude (Frobenius norm). It's calculated asnorm(discarded_singular_values) / norm(base_layer_weights)
. This indicates how much the pruning changes the model's output relative to the scale of the original model layer. Often, this percentage is much smaller thanrle_lora
, suggesting the removed dimensions were small compared to the base model's weights, even if they were a larger fraction of the LoRA's own weights.<layer_name>
: The name of the specific LoRA layer being processed.
You will also see WARNING
messages for layers detected as being entirely zero in the original LoRA file. These layers are skipped. With SDXL, A LoRA layer is instanciated for some text encoder blocks but they are never trained. So this warning should be ignored in most cases.
Finally, INFO level (-v
or -vv
) shows Score quantiles
, giving a statistical overview of the calculated importance scores across all dimensions before thresholding.
The Frobenius norm
-
Spectral Norm:
$\sigma_{\max}(W)$ is the largest singular value. It represents the maximum scaling factor the matrix applies to any input vector. -
Frobenius Norm:
$\|W\|_F = \sqrt{\sum_{i=1}^{\min(n,m)} \sigma_i^2(W)}$ . It's like the Euclidean distance if you unroll the matrix into a vector.
For typical neural network weight matrices where singular values decrease slowly, there's an approximate relationship:
This means
Comparing fro_ckpt
and spn_ckpt
:
- Scoring with
fro_ckpt=1
uses$\sigma_i(BA) / |W_\text{ckpt}|_\text{F}$ as the core ratio. - Scoring with
spn_ckpt=1
uses$\sigma_i(BA) / \sigma_\text{max}(W_\text{ckpt})$ .
Therefore, fro_ckpt
score is similar to spn_ckpt
divided by
Due to the relationship above, using fro_ckpt
implicitly penalizes layers with larger dimensions (higher spn_ckpt
does, somewhat similar to adding a small weight to the params
score.
Calculating the Singular Value Decomposition (SVD) is central to analyzing the LoRA layers. A naive approach would involve reconstructing the full-rank matrix modification lora_up.weight
and lora_down.weight
) before performing the SVD. If the layer dimensions are large (e.g.,
This tool avoids forming the full
- Compute the QR decomposition of
$B$ :$B = Q_B R_B$ . - Compute the QR decomposition of
$A^T$ :$A^T = Q_A R_A$ , which means$A = R_A^T Q_A^T$ . - Form the much smaller
$r \times r$ matrix$M = R_B R_A^T$ . - Compute the SVD of this small matrix:
$M = U_M S V_M^T$ . - The singular values
$S$ of$BA$ are the same as the singular values of$M$ . The full singular vectors can be reconstructed if needed as$U = Q_B U_M$ and$V = Q_A V_M$ .
This approach significantly reduces computational cost, as the expensive SVD is performed on a small
Several scoring methods (spn_ckpt
, fro_ckpt
, subspace
) rely on properties of the corresponding layers in the base model checkpoint, specifically their spectral norm ($\sigma_{\max}(W_{\text{ckpt}})$) or Frobenius norm (
To avoid redundant calculations, the tool automatically caches these base model layer statistics:
- When a norm for a specific base layer is needed for the first time, it is computed.
- The result is stored in a JSON file (e.g.,
norms_cache.json
) associated with theBaseCheckpoint
. - On subsequent runs, or when processing other LoRA files against the same base checkpoint, the tool reads the required norms directly from the cache file.
This caching mechanism speeds up processing after the first run, particularly when applying multiple recipes or resizing batches of LoRAs that share the same base model.
(TBD) General procedure:
- Compress a LoRA with default parameters
-r spn_ckpt=1,thr=-1.2
, note the file size. - Compress the same original LoRA with
-r spn_lora=1,size=<file size of the first output>
, note the threshold. - Compress the same original LoRA using
kohya-ss/sd-scripts/networks/resize_lora.py
--dynamic_method="sv_ratio" --dynamic_param=<10**-threshold noted from 2.>
.
LoRAs from steps 2 and 3 should give similar results. Evaluation of spn_ckpt
scoring compares step 1 against steps 2 and 3.
This project is licensed under the MIT License. See the LICENSE
file for details.
Contributions are welcome! Please open an issue to discuss changes or submit a pull request on GitHub.