Skip to content

Conversation

@jcabrero
Copy link
Member

No description provided.

@jcabrero jcabrero requested a review from Copilot November 14, 2025 09:58
Copilot finished reviewing on behalf of jcabrero November 14, 2025 10:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds model layer caching to the CI/CD pipeline to avoid re-downloading large language models during each build, improving build times and reducing bandwidth usage.

Key Changes:

  • Added Docker BuildKit registry cache support for all component images
  • Implemented optional model pre-downloading in the vLLM Dockerfile during build time
  • Configured GitHub Container Registry as the cache backend with automatic login

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
docker/vllm.Dockerfile Added build arguments and RUN command to optionally pre-download HuggingFace models during image build for layer caching
.github/workflows/cicd.yml Added Docker Buildx setup, GHCR authentication, registry cache configuration, and vLLM-specific model caching build arguments

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

# Add model caching for vllm component
EXTRA_BUILD_ARGS=""
if [ "${{ matrix.component }}" = "vllm" ] && [ -n "${{ matrix.model_to_cache || '' }}" ]; then
EXTRA_BUILD_ARGS="--build-arg MODEL_TO_CACHE=${{ matrix.model_to_cache }} --build-arg HF_TOKEN=${{ secrets.HF_TOKEN }}"
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HF_TOKEN build argument should be passed as a secret mount instead of a regular build argument to avoid exposing the token in build history and cached layers. Use --secret id=hf_token,env=HF_TOKEN in the docker build command and access it via RUN --mount=type=secret,id=hf_token in the Dockerfile for better security.

Copilot uses AI. Check for mistakes.
RUN if [ -n "$MODEL_TO_CACHE" ]; then \
echo "Pre-downloading model: $MODEL_TO_CACHE"; \
export HF_TOKEN="${HF_TOKEN}"; \
python3 -c "from huggingface_hub import snapshot_download; snapshot_download('$MODEL_TO_CACHE', cache_dir='/root/.cache/huggingface')"; \
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing error handling for model download failures. If snapshot_download fails (e.g., due to network issues, invalid token, or non-existent model), the build will fail without a clear error message. Consider adding error handling with set -e and a trap or wrapping the Python command to provide a more informative error message.

Suggested change
python3 -c "from huggingface_hub import snapshot_download; snapshot_download('$MODEL_TO_CACHE', cache_dir='/root/.cache/huggingface')"; \
python3 -c "from huggingface_hub import snapshot_download; snapshot_download('$MODEL_TO_CACHE', cache_dir='/root/.cache/huggingface')" \
|| { echo >&2 "ERROR: Failed to pre-download model '$MODEL_TO_CACHE'. Check your network connection, HF_TOKEN, and model name."; exit 1; }; \

Copilot uses AI. Check for mistakes.
Comment on lines 22 to 24
RUN if [ -n "$MODEL_TO_CACHE" ]; then \
echo "Pre-downloading model: $MODEL_TO_CACHE"; \
export HF_TOKEN="${HF_TOKEN}"; \
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HF_TOKEN is passed as a build argument, which means it will be visible in the image history and cached layers. This is a security concern as the token could be exposed. Consider using Docker BuildKit secrets with --mount=type=secret instead to securely pass the token without leaving it in the image layers.

Copilot uses AI. Check for mistakes.
Comment on lines +146 to +150
--cache-from=${CACHE_FROM} \
--cache-to=${CACHE_TO} \
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache-to operation requires packages: write permission to push cache layers to GitHub Container Registry. The workflow only has contents: read and id-token: write permissions at the top level. While GITHUB_TOKEN is used for authentication, the workflow-level permissions may prevent successful cache push, causing the cache layer upload to fail silently or with permission errors.

Copilot uses AI. Check for mistakes.
@jcabrero jcabrero force-pushed the chore/ci_model_caching branch from c09d912 to fb8b9a1 Compare November 14, 2025 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants