Skip to content

Potentially deprecated path #1361

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
eldarkurtic opened this issue Apr 18, 2025 · 2 comments
Open

Potentially deprecated path #1361

eldarkurtic opened this issue Apr 18, 2025 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@eldarkurtic
Copy link
Collaborator

Running oneshot(model=model, recipe=recipe) always prints this warning:

2025-04-18T08:39:09.006802+0000 | post_process | WARNING - Optimized model is not saved. To save, please provide

which I believe is coming from the old and deprecated pathway of saving compressed models with oneshot(model=model, recipe=recipe, save_dir=<path>).

Perhaps we should remove the warning as it is misleading given that we are now saving models with:

oneshot(model=model, recipe=recipe)
model.save_pretrained(<path>)
@eldarkurtic eldarkurtic added the bug Something isn't working label Apr 18, 2025
@dsikka
Copy link
Collaborator

dsikka commented Apr 18, 2025

The old pathway still exists. It just isn’t recommended

@dsikka dsikka self-assigned this Apr 18, 2025
@eldarkurtic
Copy link
Collaborator Author

eldarkurtic commented Apr 18, 2025

Would it make sense to suppress the warning because if we run this quantization script (and we use the recommended pathway to save the model with model.save_pretrained(...)):

from llmcompressor.modifiers.quantization import QuantizationModifier
from llmcompressor import oneshot
from transformers import AutoModelForCausalLM

# First we load the model
model_stub = "mistralai/Mistral-Small-24B-Instruct-2501"
model = AutoModelForCausalLM.from_pretrained(model_stub)

# Then we specify the quantization recipe
recipe = QuantizationModifier(
    targets="Linear",
    scheme="FP8_dynamic",
    ignore=["lm_head"],
)

# Then we apply quantization
oneshot(model=model, recipe=recipe)

# Finally, we save the quantized model to disk
save_path = model_stub + "-FP8-dynamic"
model.save_pretrained(save_path, skip_compression_stats=True, disable_sparse_compression=True)

we would be greeted with stdout that contains this warning Optimized model is not saved. To save, please provide which could be a bit confusing:

Loading checkpoint shards: 100%|███████████████████████████████████| 10/10 [00:05<00:00,  1.81it/s]
2025-04-18T10:00:46.447596+0000 | reset | INFO - Compression lifecycle reset
2025-04-18T10:00:46.448282+0000 | from_modifiers | INFO - Creating recipe from modifiers
manager stage: Modifiers initialized
2025-04-18T10:00:47.144978+0000 | initialize | INFO - Compression lifecycle initialized for 1 modifiers
manager stage: Modifiers finalized
2025-04-18T10:00:47.145196+0000 | finalize | INFO - Compression lifecycle finalized for 1 modifiers
2025-04-18T10:00:47.145245+0000 | post_process | WARNING - Optimized model is not saved. To save, please provide
Checking whether model follows 2:4 sparsity structure: 100%|█████| 281/281 [00:24<00:00, 11.26it/s]
Calculating quantization compression ratio: 404it [00:00, 671.63it/s]
Quantized Compression: 100%|█████████████████████████████████████| 923/923 [00:29<00:00, 31.23it/s]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants