-
Notifications
You must be signed in to change notification settings - Fork 10
chore: added CI model layer caching #168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds model layer caching to the CI/CD pipeline to avoid re-downloading large language models during each build, improving build times and reducing bandwidth usage.
Key Changes:
- Added Docker BuildKit registry cache support for all component images
- Implemented optional model pre-downloading in the vLLM Dockerfile during build time
- Configured GitHub Container Registry as the cache backend with automatic login
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| docker/vllm.Dockerfile | Added build arguments and RUN command to optionally pre-download HuggingFace models during image build for layer caching |
| .github/workflows/cicd.yml | Added Docker Buildx setup, GHCR authentication, registry cache configuration, and vLLM-specific model caching build arguments |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
.github/workflows/cicd.yml
Outdated
| # Add model caching for vllm component | ||
| EXTRA_BUILD_ARGS="" | ||
| if [ "${{ matrix.component }}" = "vllm" ] && [ -n "${{ matrix.model_to_cache || '' }}" ]; then | ||
| EXTRA_BUILD_ARGS="--build-arg MODEL_TO_CACHE=${{ matrix.model_to_cache }} --build-arg HF_TOKEN=${{ secrets.HF_TOKEN }}" |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The HF_TOKEN build argument should be passed as a secret mount instead of a regular build argument to avoid exposing the token in build history and cached layers. Use --secret id=hf_token,env=HF_TOKEN in the docker build command and access it via RUN --mount=type=secret,id=hf_token in the Dockerfile for better security.
docker/vllm.Dockerfile
Outdated
| RUN if [ -n "$MODEL_TO_CACHE" ]; then \ | ||
| echo "Pre-downloading model: $MODEL_TO_CACHE"; \ | ||
| export HF_TOKEN="${HF_TOKEN}"; \ | ||
| python3 -c "from huggingface_hub import snapshot_download; snapshot_download('$MODEL_TO_CACHE', cache_dir='/root/.cache/huggingface')"; \ |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing error handling for model download failures. If snapshot_download fails (e.g., due to network issues, invalid token, or non-existent model), the build will fail without a clear error message. Consider adding error handling with set -e and a trap or wrapping the Python command to provide a more informative error message.
| python3 -c "from huggingface_hub import snapshot_download; snapshot_download('$MODEL_TO_CACHE', cache_dir='/root/.cache/huggingface')"; \ | |
| python3 -c "from huggingface_hub import snapshot_download; snapshot_download('$MODEL_TO_CACHE', cache_dir='/root/.cache/huggingface')" \ | |
| || { echo >&2 "ERROR: Failed to pre-download model '$MODEL_TO_CACHE'. Check your network connection, HF_TOKEN, and model name."; exit 1; }; \ |
docker/vllm.Dockerfile
Outdated
| RUN if [ -n "$MODEL_TO_CACHE" ]; then \ | ||
| echo "Pre-downloading model: $MODEL_TO_CACHE"; \ | ||
| export HF_TOKEN="${HF_TOKEN}"; \ |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The HF_TOKEN is passed as a build argument, which means it will be visible in the image history and cached layers. This is a security concern as the token could be exposed. Consider using Docker BuildKit secrets with --mount=type=secret instead to securely pass the token without leaving it in the image layers.
| --cache-from=${CACHE_FROM} \ | ||
| --cache-to=${CACHE_TO} \ |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache-to operation requires packages: write permission to push cache layers to GitHub Container Registry. The workflow only has contents: read and id-token: write permissions at the top level. While GITHUB_TOKEN is used for authentication, the workflow-level permissions may prevent successful cache push, causing the cache layer upload to fail silently or with permission errors.
c09d912 to
fb8b9a1
Compare
No description provided.