diff --git a/serverless/endpoints/model-caching.mdx b/serverless/endpoints/model-caching.mdx index 147cb882..d509a862 100644 --- a/serverless/endpoints/model-caching.mdx +++ b/serverless/endpoints/model-caching.mdx @@ -59,6 +59,32 @@ flowchart TD ``` +## Where models are stored + +Cached models are stored in a Runpod-managed Docker volume mounted at `/runpod-volume/huggingface-cache/hub/`. The model cache is automatically managed and persists across requests on the same worker. + + +While cached models use the same mount path as network volumes (`/runpod-volume/`), the model loaded from the cache will load significantly faster than the same model loaded from a network volume. + + +## Accessing cached models in your application + +Models are cached on your workers at `/runpod-volume/huggingface-cache/hub/` following Hugging Face cache conventions. The directory structure replaces forward slashes (`/`) from the original model name with double dashes (`--`), and includes a version hash subdirectory. + +The path structure follows this pattern: + +``` +/runpod-volume/huggingface-cache/hub/models--HF_ORGANIZATION--MODEL_NAME/snapshots/VERSION_HASH/ +``` + +For example, the model `gensyn/qwen2.5-0.5b-instruct` would be stored at: + +``` +/runpod-volume/huggingface-cache/hub/models--gensyn--qwen2.5-0.5b-instruct/snapshots/317b7eb96312eda0c431d1dab1af958a308cb35e/ +``` + +If your application requires specific paths, configure it to scan `/runpod-volume/huggingface-cache/hub/` for models. + ## Enabling cached models @@ -85,4 +111,4 @@ Follow these steps to select and add a cached model to your Serverless endpoint: -You can add a cached model to an existing endpoint by selecting **Manage → Edit Endpoint** in the endpoint details page and updating the **Model (optional)** field. \ No newline at end of file +You can add a cached model to an existing endpoint by selecting **Manage → Edit Endpoint** in the endpoint details page and updating the **Model (optional)** field.