[Cache] Fix environment variable handling for offline mode #1902

ralphbean · 2025-10-07T14:16:35Z

SUMMARY:

Previously, llm-compressor ignored HF_HUB_CACHE and other environment variables when loading models and datasets, making offline mode difficult to use with unified cache directories.

This change:

Removes hard-coded TRANSFORMERS_CACHE in model_load/helpers.py to respect HF_HOME, HF_HUB_CACHE environment variables
Propagates cache_dir from model_args to dataset_args to enable unified cache directory for both models and datasets
Updates dataset loading to use cache_dir parameter instead of hardcoded None

Now users can specify cache_dir parameter or use HF_HOME/HF_HUB_CACHE environment variables for true offline operation.

Offline mode is super helpful to supply-chain security use cases. It helps us generate trustworthy SBOMs for AI stuff. 🔐 🧠

TEST PLAN:

I start with the oneshot example from the README, and called it example.py:

""" This is the example from the README """

from llmcompressor.modifiers.smoothquant import SmoothQuantModifier
from llmcompressor.modifiers.quantization import GPTQModifier
from llmcompressor import oneshot

recipe = [
    SmoothQuantModifier(smoothing_strength=0.8),
    GPTQModifier(scheme="W8A8", targets="Linear", ignore=["lm_head"]),
]

oneshot(
    model="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    dataset="open_platypus",
    recipe=recipe,
    output_dir="TinyLlama-1.1B-Chat-v1.0-INT8",
    max_seq_length=2048,
    num_calibration_samples=512,
)

Next, remove your hf local cache to ensure your system has nothing available to it yet:

❯ rm -rf ~/.cache/huggingface

Then, run example.py with the HF_HUB_OFFLINE=1 env var. This should fail, proving that you have nothing cached.

❯ HF_HUB_OFFLINE=1 python example.py
Traceback (most recent call last):
  File "/home/rbean/code/llm-compressor/testtest/lib64/python3.13/site-packages/transformers/utils/hub.py", line 479, in cached_files
...
<snip>
...
OSError: We couldn't connect to 'https://huggingface.co' to load the files, and couldn't find them in the cached files.
Check your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

Good. Now, run it with HF_HUB=./hf-hub which will run it in online mode, populating the cache in a new non-standard location (just to be sure things don't get mixed up during our test):

❯ HF_HOME=./hf-hub python example.py
<lots of downloading happens, but you can ctrl-C when it gets into the real compression work>

Now, finally, you can run with both HF_HOME and HF_HUB_OFFLINE=1 and prove to yourself that llm-compressor uses that freshly-populated cache for both the model and the dataset.

❯ HF_HOME=./hf-hub HF_HUB_OFFLINE=1 python example.py
<it works!>

github-actions · 2025-10-07T14:16:44Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

gemini-code-assist · 2025-10-07T14:16:50Z

Summary of Changes

Hello @ralphbean, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the llm-compressor library's support for offline operations by correctly integrating with Hugging Face's caching mechanisms. It ensures that both models and datasets can utilize a single, user-specified or environment-variable-defined cache directory, which is vital for secure supply-chain practices and generating accurate Software Bill of Materials (SBOMs) for AI components.

Highlights

Environment Variable Handling: The pull request fixes an issue where llm-compressor previously ignored standard Hugging Face environment variables like HF_HUB_CACHE and HF_HOME, which made true offline mode difficult to achieve.
Unified Cache Directory: The changes enable a unified cache directory for both models and datasets by propagating the cache_dir parameter from model arguments to dataset arguments.
Removed Hard-coded Cache Path: A hard-coded TRANSFORMERS_CACHE path has been removed, allowing the system to respect environment variables for determining the cache location for models.
Improved Dataset Loading: Dataset loading has been updated to explicitly use the cache_dir parameter, ensuring datasets are also loaded from the correct, user-defined or environment-variable-specified cache.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request improves offline mode support by ensuring Hugging Face environment variables for caching are respected. The changes correctly propagate the cache_dir from model arguments to dataset arguments, and adjust hf_hub_download to use the default caching behavior. My main feedback is on how the cache_dir is added to dataset_args, suggesting a more explicit definition in the DatasetArguments dataclass for better code clarity and maintainability.

src/llmcompressor/args/utils.py

ralphbean · 2025-10-07T14:22:01Z

For context, I got interested in fixing this after trying to make llm-compressor work in combination with hermeto -> hermetoproject/hermeto#1141

Previously, llm-compressor ignored HF_HUB_CACHE and other environment variables when loading models and datasets, making offline mode difficult to use with unified cache directories. This change: - Removes hard-coded TRANSFORMERS_CACHE in model_load/helpers.py to respect HF_HOME, HF_HUB_CACHE environment variables - Propagates cache_dir from model_args to dataset_args to enable unified cache directory for both models and datasets - Updates dataset loading to use cache_dir parameter instead of hardcoded None Now users can specify cache_dir parameter or use HF_HOME/HF_HUB_CACHE environment variables for true offline operation. Signed-off-by: Ralph Bean <[email protected]> Co-Authored-By: Claude <[email protected]>

brian-dellabetta

Thanks for the contribution! One question

src/llmcompressor/pytorch/model_load/helpers.py

kylesayrs

I think it would be better to remove model_args.cache_dir and dataset_args.cache_dir if their removal means that the user can use HF_HUB_CACHE for both

kylesayrs · 2025-10-14T02:21:42Z

Hi @ralphbean! Are you still interested in contributing this PR? Or are you looking for someone to take it over?

Following feedback on PR vllm-project#1902, this removes the cache_dir parameter entirely from ModelArguments, DatasetArguments, and the oneshot() API. By removing explicit cache_dir parameters and setting all calls to cache_dir=None, the HuggingFace libraries will automatically respect the standard environment variable hierarchy (HF_HOME, HF_HUB_CACHE) for determining cache locations. This approach: - Simplifies the codebase by removing parameter propagation - Follows standard HuggingFace patterns - Prevents cache_dir from being accidentally ignored - Still fully supports offline mode via environment variables Breaking change: Users who previously used the cache_dir parameter should now use HF_HOME or HF_HUB_CACHE environment variables instead. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Ralph Bean <[email protected]>

ralphbean · 2025-10-17T19:26:45Z

@kylesayrs I'm definitely still interested in trying to get this in. I think I'd prefer to remove those cache_dir options too. I gave it a go in cd93e72.

brian-dellabetta

Thanks for the updates! This looks good to me, we should respect the env vars

src/llmcompressor/pytorch/model_load/helpers.py

kylesayrs

Looks good to me. These changes are in-line with our full adoption of targeting transformers models and datasets datasets

kylesayrs · 2025-10-19T02:53:11Z

Runners are down right now, will merge once they're back up

gemini-code-assist bot reviewed Oct 7, 2025

View reviewed changes

src/llmcompressor/args/utils.py Outdated Show resolved Hide resolved

ralphbean force-pushed the dataset-cache-var branch from 655881e to b8c6ed6 Compare October 7, 2025 15:10

ralphbean force-pushed the dataset-cache-var branch from b8c6ed6 to 2dea601 Compare October 7, 2025 15:11

brian-dellabetta reviewed Oct 7, 2025

View reviewed changes

src/llmcompressor/pytorch/model_load/helpers.py Show resolved Hide resolved

brian-dellabetta requested review from fynnsu and kylesayrs October 7, 2025 19:50

kylesayrs reviewed Oct 8, 2025

View reviewed changes

brian-dellabetta approved these changes Oct 17, 2025

View reviewed changes

src/llmcompressor/pytorch/model_load/helpers.py Show resolved Hide resolved

kylesayrs approved these changes Oct 18, 2025

View reviewed changes

kylesayrs enabled auto-merge (squash) October 19, 2025 02:52

[Cache] Fix environment variable handling for offline mode #1902

Are you sure you want to change the base?

[Cache] Fix environment variable handling for offline mode #1902

Uh oh!

Conversation

ralphbean commented Oct 7, 2025

Uh oh!

github-actions bot commented Oct 7, 2025

Uh oh!

gemini-code-assist bot commented Oct 7, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

ralphbean commented Oct 7, 2025

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kylesayrs left a comment

Choose a reason for hiding this comment

Uh oh!

kylesayrs commented Oct 14, 2025

Uh oh!

ralphbean commented Oct 17, 2025

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kylesayrs left a comment

Choose a reason for hiding this comment

Uh oh!

kylesayrs commented Oct 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants