Skip to content

Conversation

biswapanda
Copy link
Contributor

@biswapanda biswapanda commented Oct 15, 2025

Overview:

respect HF_HOME env var if specified.
https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables#hfhome

HF_HOME in needed in long term because HF moved the xet folder under HF_HOME

ls -al $HF_HOME

total 4
drwxr-xr-x 1 root root    0 Oct 15 04:10 .
drwxr-xr-x 1 root root 4096 Oct 15 05:40 ..
drwxr-xr-x 1 root root    0 Oct 15 04:10 hub
drwxr-xr-x 1 root root    0 Oct 15 04:10 xet

closes DYN-1248

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features

    • Added support for configuring the cache directory via environment variables. If set, HF_HUB_CACHE takes precedence; otherwise HF_HOME is used (with a “hub” subdirectory). Existing default paths remain unchanged.
  • Tests

    • Added tests to validate cache path resolution, including correct handling of HF_HOME.

@biswapanda biswapanda requested a review from a team as a code owner October 15, 2025 05:42
@biswapanda biswapanda self-assigned this Oct 15, 2025
@github-actions github-actions bot added the fix label Oct 15, 2025
Copy link
Contributor

coderabbitai bot commented Oct 15, 2025

Walkthrough

Added environment-variable-based cache resolution in get_model_express_cache_dir to check HF_HUB_CACHE and HF_HOME (HF_HOME joined with "hub"), preserving existing fallbacks. Added a test validating HF_HOME handling.

Changes

Cohort / File(s) Summary
Cache path resolution updates and tests
lib/llm/src/hub.rs
Implemented env var checks for HF_HUB_CACHE and HF_HOME (with "hub" suffix) in get_model_express_cache_dir; retained existing MODEL_EXPRESS_CACHE_PATH and HOME/USERPROFILE fallbacks; added test test_get_model_express_cache_dir_with_hf_home to validate HF_HOME behavior.

Sequence Diagram(s)

sequenceDiagram
    actor Caller
    participant Hub as hub::get_model_express_cache_dir
    participant Env as Environment
    participant FS as Filesystem

    Caller->>Hub: get_model_express_cache_dir()
    Hub->>Env: read HF_HUB_CACHE
    alt HF_HUB_CACHE set
        Hub-->>Caller: return HF_HUB_CACHE
    else HF_HUB_CACHE not set
        Hub->>Env: read HF_HOME
        alt HF_HOME set
            Hub->>FS: join(HF_HOME, "hub")
            Hub-->>Caller: return HF_HOME/hub
        else HF_HOME not set
            Hub->>FS: resolve existing fallbacks<br/>(MODEL_EXPRESS_CACHE_PATH, HOME/USERPROFILE)
            Hub-->>Caller: return fallback path
        end
    end

    Note over Hub: Env var precedence: HF_HUB_CACHE → HF_HOME/hub → existing fallbacks
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

In burrows of bytes I hop with cheer,
New paths for caches now crystal-clear.
If HUB is set, I’ll scamper there;
If HOME calls “hub,” I’m almost there.
And should those fade, I’ll find the rest—
A tidy warren for model’s nest. 🐇✨

Pre-merge checks

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description Check ⚠️ Warning The pull request description includes the Overview section but leaves both the Details and Where should the reviewer start sections empty, retaining only placeholder comments, and the Related Issues section is still using a placeholder rather than a real GitHub issue reference. This means the required template sections are present but not populated with the necessary information. As a result, the description does not fully adhere to the repository’s template. Please populate the Details section with a concise summary of the code changes, update the Where should the reviewer start section to call out specific files or functions, and provide a valid GitHub issue number under Related Issues instead of placeholders. Ensuring all template sections are filled will bring the description into compliance.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title concisely summarizes the primary change by indicating that the model express common code now uses the HF_HOME environment variable, matching the main update introduced in the PR. It is a clear, single-sentence description that directly reflects the modifications without including unnecessary details or vague terminology. The conventional "fix:" prefix is acceptable and helps categorize the change. This title gives reviewers immediate context on the PR’s objective.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@biswapanda biswapanda changed the title fix: update to use HF_HOME dir as well fix: update model express common to use HF_HOME (if specified) Oct 15, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6a1391e and 3381747.

📒 Files selected for processing (1)
  • lib/llm/src/hub.rs (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: sglang
  • GitHub Check: vllm (amd64)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: tests (lib/runtime/examples)
  • GitHub Check: clippy (launch/dynamo-run)
  • GitHub Check: tests (lib/bindings/python)
  • GitHub Check: clippy (.)
  • GitHub Check: clippy (lib/bindings/python)
  • GitHub Check: tests (launch/dynamo-run)
🔇 Additional comments (1)
lib/llm/src/hub.rs (1)

93-97: LGTM!

The documentation comments improve code clarity and reference the official Hugging Face documentation.

Comment on lines +99 to +103
// Check HF_HOME environment variable (standard Hugging Face cache directory)
// reference: https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables#hfhome
if let Ok(hf_home) = env::var("HF_HOME") {
return PathBuf::from(hf_home).join("hub");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Verify the precedence order across all cache path environment variables.

The implementation correctly appends "hub" to HF_HOME per the Hugging Face documentation. However, ensure that the precedence order (HF_HUB_CACHE → HF_HOME → MODEL_EXPRESS_CACHE_PATH → HOME fallback) aligns with expected behavior and is thoroughly tested.

Run the following script to check if any existing code or documentation assumes a different precedence order:


🏁 Script executed:

#!/bin/bash
# Description: Search for references to cache environment variables and their precedence

# Search for any documentation or comments about cache precedence
rg -n -C3 'HF_HUB_CACHE|HF_HOME|MODEL_EXPRESS_CACHE_PATH' --type rust --type md

# Search for other usages of these environment variables
rg -n 'env::var.*HF_HUB_CACHE|env::var.*HF_HOME|env::var.*MODEL_EXPRESS_CACHE_PATH' --type rust

Length of output: 4246


Add tests covering all cache path environment variable branches and their precedence

  • Test that HF_HUB_CACHE overrides HF_HOME and MODEL_EXPRESS_CACHE_PATH.
  • Test that MODEL_EXPRESS_CACHE_PATH is used when HF_HUB_CACHE and HF_HOME are unset.
  • Test fallback behavior to $HOME/.cache/huggingface/hub when none are set.
🤖 Prompt for AI Agents
In lib/llm/src/hub.rs around lines 99–103, add unit tests exercising all
environment-variable branches and their precedence: write three tests that (1)
set HF_HUB_CACHE and also set HF_HOME and MODEL_EXPRESS_CACHE and assert the
function returns HF_HUB_CACHE, (2) unset HF_HUB_CACHE but set
MODEL_EXPRESS_CACHE (and unset HF_HOME) and assert it returns
MODEL_EXPRESS_CACHE, and (3) unset HF_HUB_CACHE and MODEL_EXPRESS_CACHE and set
HOME to a temp directory and assert it falls back to
$HOME/.cache/huggingface/hub. In each test manipulate env vars with
std::env::set_var/remove_var (saving and restoring originals) or use a helper to
ensure isolation, use a tempdir for expected paths, and run tests serially
(e.g., serial_test or single-threaded test runner) so global env changes don’t
interfere. Ensure assertions compare PathBufs and clean up/restore env state at
test end.

Comment on lines +134 to +145
#[test]
fn test_get_model_express_cache_dir_with_hf_home() {
// Test that HF_HOME is respected when set
unsafe {
env::set_var("HF_HOME", "/custom/cache/path");
let cache_dir = get_model_express_cache_dir();
assert_eq!(cache_dir, PathBuf::from("/custom/cache/path/hub"));

// Clean up
env::remove_var("HF_HOME");
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Environment variable mutation in tests causes race conditions.

Using env::set_var in an unsafe block is problematic because Rust runs tests concurrently by default. This can cause:

  1. Race conditions: Multiple tests modifying environment variables simultaneously
  2. Flaky tests: Non-deterministic failures when tests interfere with each other
  3. Incomplete isolation: The test doesn't clear HF_HUB_CACHE or MODEL_EXPRESS_CACHE_PATH, so if those are set in the environment, this test won't actually exercise the HF_HOME code path

Consider one of these solutions:

Solution 1 (Recommended): Use the serial_test crate

Add to Cargo.toml:

[dev-dependencies]
serial_test = "3.0"

Then apply this diff:

+    #[serial_test::serial]
     #[test]
     fn test_get_model_express_cache_dir_with_hf_home() {
-        // Test that HF_HOME is respected when set
         unsafe {
+            // Clear other cache env vars to ensure HF_HOME is tested
+            env::remove_var("HF_HUB_CACHE");
+            env::remove_var("MODEL_EXPRESS_CACHE_PATH");
+            
             env::set_var("HF_HOME", "/custom/cache/path");
             let cache_dir = get_model_express_cache_dir();
             assert_eq!(cache_dir, PathBuf::from("/custom/cache/path/hub"));

             // Clean up
             env::remove_var("HF_HOME");
         }
     }

Solution 2: Refactor to accept environment as a parameter

Refactor get_model_express_cache_dir to accept an environment provider trait, allowing tests to inject mock environments without mutating global state.

🤖 Prompt for AI Agents
In lib/llm/src/hub.rs around lines 134-145, the test mutates process-wide env
vars unsafely causing race conditions; fix it by adding serial_test as a
dev-dependency in Cargo.toml and annotating the test with serial_test::serial so
it runs single-threaded, remove the unsafe block and use
std::env::set_var/remove_var normally inside the test, and ensure you clear or
save/restore HF_HUB_CACHE and MODEL_EXPRESS_CACHE_PATH (unset them or save then
restore) before setting HF_HOME so the test exercises the HF_HOME path in
isolation and cleans up all modified env vars at the end.

@biswapanda biswapanda changed the title fix: update model express common to use HF_HOME (if specified) fix: respect HF_HOME when specified Oct 15, 2025
@biswapanda biswapanda changed the title fix: respect HF_HOME when specified fix: update model express common to use HF_HOME (if specified) Oct 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant