Fix: Cache model in huggingface_local to prevent OOM (Issue #449) #457

adambarla · 2025-11-29T23:19:49Z

Problem

When fn_completions is set to huggingface_local_completions, alpaca_eval reloads the model for every chunk of data. This leads to:

Significant time overhead (reloading large models repeatedly).
OOM errors because the previous model isn't always garbage collected immediately before the new one loads.

Fixes #449.

Solution

Implemented module-level caching for the model and tokenizer, following the existing pattern in vllm_local.py.

Added _get_or_load_model helper function.
Uses global _loaded_model to persist the model across calls.
Checks if the requested model matches the cached one.
Explicitly unloads/GCs the old model if switching to a new one.

Testing

Tested locally with a dataset split into multiple chunks. Verified that:

Model loads only once.
GPU memory usage remains stable across chunks.
"Reusing cached model" logs appear for subsequent chunks.

Add caching for models and tokenizers in huggingface_local_completions to avoid reloading the model for each chunk. This prevents OOM errors when processing large datasets split into multiple chunks. - Add _get_or_load_model() helper function to handle caching logic - Cache model, tokenizer, model_name, and adapters_name at module level - Unload previous model when switching to a different model - Follows the same pattern as vllm_local.py in the codebase Closes tatsu-lab#449

adambarla added 2 commits November 29, 2025 22:48

perf: sort prompts by length descending for early OOM detection

07cd28b

adambarla changed the title ~~Fix: Cache model in huggingface_local to prevent OOM (#449)~~ Fix: Cache model in huggingface_local to prevent OOM (Issue #449) Nov 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Cache model in huggingface_local to prevent OOM (Issue #449) #457

Fix: Cache model in huggingface_local to prevent OOM (Issue #449) #457

Uh oh!

adambarla commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix: Cache model in huggingface_local to prevent OOM (Issue #449) #457

Are you sure you want to change the base?

Fix: Cache model in huggingface_local to prevent OOM (Issue #449) #457

Uh oh!

Conversation

adambarla commented Nov 29, 2025

Problem

Solution

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant