Skip to content

Conversation

Wonki4
Copy link
Contributor

@Wonki4 Wonki4 commented Oct 11, 2025

What type of PR is this?

Fix ci error caused by ray default image.

What this PR does / why we need it:

When ray e2e sequence test is executed, the pod could not pull the e2e default image(bitnami/ray:2.49.0, not found)

Oct 02 06:57:23 integration-worker2 kubelet[273]: E1002 06:57:23.173290     273 pod_workers.go:1301] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"ray\" with ImagePullBackOff: \"Back-off pulling image \\\"bitnami/ray:2.49.0\\\": ErrImagePull: rpc error: code = NotFound desc = failed to pull and unpack image \\\"docker.io/bitnami/ray:2.49.0\\\": failed to resolve reference \\\"docker.io/bitnami/ray:2.49.0\\\": docker.io/bitnami/ray:2.49.0: not found\"" pod="erli6hgc/ray-cluster-job-head-0" podUID="433a6958-3d9e-43cc-9d14-965b43db6541"

Because bitnami no longer provide the image. bitnami/ray:2.49.0
image

Which issue(s) this PR fixes:

Fixes #4669

Special notes for your reviewer:

Does this PR introduce a user-facing change?


@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign shinytang6
You can assign the PR to them by writing /assign @shinytang6 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

Summary of Changes

Hello @Wonki4, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical CI failure in the Ray end-to-end test suite. The issue was caused by the bitnami/ray:2.49.0 Docker image becoming unavailable, leading to image pull failures during test execution. By updating the image reference to the correct and available rayproject/ray:2.49.0, this change ensures that the Ray e2e tests can once again run without interruption, maintaining the stability and reliability of the CI pipeline.

Highlights

  • CI Fix: Resolved a continuous integration (CI) error that was preventing Ray end-to-end tests from executing successfully.
  • Image Update: Updated the DefaultRayImage constant from the unavailable bitnami/ray:2.49.0 to the official rayproject/ray:2.49.0 image.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@volcano-sh-bot volcano-sh-bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Oct 11, 2025
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR aims to fix a CI error by updating the default Ray image used in E2E tests. The switch from the deprecated bitnami/ray to rayproject/ray is the correct approach. However, the image tag 2.49.0 appears to be a typo and is invalid for the rayproject/ray repository as well, which will cause the CI to continue failing. I've provided a suggestion for a valid tag.

Additionally, please note that example/integrations/ray/ray-example.yaml also uses the old, invalid image bitnami/ray:2.49.0. It would be beneficial to update it within this PR to ensure the provided example is functional.

// "volcanosh/pytorch-mnist-v1beta1-9ee8fda-example:0.0.1" is from "docker.io/kubeflowkatib/pytorch-mnist:v1beta1-9ee8fda"
DefaultPytorchImage = "volcanosh/pytorch-mnist-v1beta1-9ee8fda-example:0.0.1"
DefaultRayImage = "bitnami/ray:2.49.0"
DefaultRayImage = "rayproject/ray:2.49.0"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The image rayproject/ray:2.49.0 does not exist on Docker Hub. It seems 2.49.0 is a typo that was carried over from the previous image URI. To fix the CI error, please use a valid image tag. For example, 2.9.3 is a valid and stable version.

Suggested change
DefaultRayImage = "rayproject/ray:2.49.0"
DefaultRayImage = "rayproject/ray:2.9.3"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://hub.docker.com/layers/rayproject/ray/2.49.0

2.49.0 is valid and stable version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://hub.docker.com/layers/rayproject/ray/2.49.0

2.49.0 is valid and stable version.

After you changed the ray image, can the flaky test be fixed? Do we still need to prune the images?

@volcano-sh-bot volcano-sh-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 12, 2025
@Wonki4
Copy link
Contributor Author

Wonki4 commented Oct 12, 2025

@Monokaix

Could you share your result of #4642 about deleting docker images in test/e2e/jobseq package?

@volcano-sh-bot volcano-sh-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 13, 2025
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we do need this PruneUnusedImagesOnAllNodes? Cause #4642 has already used docker system prune to free disk. The e2e testing is running on a github runner machine, and the runner machine runs a kind cluster, do we still need to list all the k8s nodes and exec PruneUnusedImagesOnAllNodes?

Copy link
Contributor Author

@Wonki4 Wonki4 Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw the PR and I knew that the code for prune was deleted.
The final change is related to loading vc images in control plane node.

@Monokaix
Copy link
Member

There is some misunderstanding, #4642 just load volcano components images to control-plane node because volcano is only deployed in that node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ray E2E Test default image is not available

4 participants