-
Notifications
You must be signed in to change notification settings - Fork 261
[Cache] Fix environment variable handling for offline mode #1902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed. |
Summary of ChangesHello @ralphbean, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly improves the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request improves offline mode support by ensuring Hugging Face environment variables for caching are respected. The changes correctly propagate the cache_dir
from model arguments to dataset arguments, and adjust hf_hub_download
to use the default caching behavior. My main feedback is on how the cache_dir
is added to dataset_args
, suggesting a more explicit definition in the DatasetArguments
dataclass for better code clarity and maintainability.
For context, I got interested in fixing this after trying to make llm-compressor work in combination with hermeto -> hermetoproject/hermeto#1141 |
655881e
to
b8c6ed6
Compare
Previously, llm-compressor ignored HF_HUB_CACHE and other environment variables when loading models and datasets, making offline mode difficult to use with unified cache directories. This change: - Removes hard-coded TRANSFORMERS_CACHE in model_load/helpers.py to respect HF_HOME, HF_HUB_CACHE environment variables - Propagates cache_dir from model_args to dataset_args to enable unified cache directory for both models and datasets - Updates dataset loading to use cache_dir parameter instead of hardcoded None Now users can specify cache_dir parameter or use HF_HOME/HF_HUB_CACHE environment variables for true offline operation. Signed-off-by: Ralph Bean <[email protected]> Co-Authored-By: Claude <[email protected]>
b8c6ed6
to
2dea601
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! One question
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to remove model_args.cache_dir
and dataset_args.cache_dir
if their removal means that the user can use HF_HUB_CACHE
for both
Hi @ralphbean! Are you still interested in contributing this PR? Or are you looking for someone to take it over? |
Following feedback on PR vllm-project#1902, this removes the cache_dir parameter entirely from ModelArguments, DatasetArguments, and the oneshot() API. By removing explicit cache_dir parameters and setting all calls to cache_dir=None, the HuggingFace libraries will automatically respect the standard environment variable hierarchy (HF_HOME, HF_HUB_CACHE) for determining cache locations. This approach: - Simplifies the codebase by removing parameter propagation - Follows standard HuggingFace patterns - Prevents cache_dir from being accidentally ignored - Still fully supports offline mode via environment variables Breaking change: Users who previously used the cache_dir parameter should now use HF_HOME or HF_HUB_CACHE environment variables instead. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Ralph Bean <[email protected]>
@kylesayrs I'm definitely still interested in trying to get this in. I think I'd prefer to remove those cache_dir options too. I gave it a go in cd93e72. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates! This looks good to me, we should respect the env vars
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. These changes are in-line with our full adoption of targeting transformers
models and datasets
datasets
Runners are down right now, will merge once they're back up |
SUMMARY:
Previously, llm-compressor ignored HF_HUB_CACHE and other environment variables when loading models and datasets, making offline mode difficult to use with unified cache directories.
This change:
Now users can specify cache_dir parameter or use HF_HOME/HF_HUB_CACHE environment variables for true offline operation.
Offline mode is super helpful to supply-chain security use cases. It helps us generate trustworthy SBOMs for AI stuff. 🔐 🧠
TEST PLAN:
I start with the oneshot example from the README, and called it
example.py
:Next, remove your hf local cache to ensure your system has nothing available to it yet:
❯ rm -rf ~/.cache/huggingface
Then, run
example.py
with the HF_HUB_OFFLINE=1 env var. This should fail, proving that you have nothing cached.Good. Now, run it with
HF_HUB=./hf-hub
which will run it in online mode, populating the cache in a new non-standard location (just to be sure things don't get mixed up during our test):Now, finally, you can run with both HF_HOME and HF_HUB_OFFLINE=1 and prove to yourself that llm-compressor uses that freshly-populated cache for both the model and the dataset.