-
Notifications
You must be signed in to change notification settings - Fork 506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add KubeRay e2e Test for custom idleTimeoutSeconds with v2 Autoscaler #2725
Conversation
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
cc @rueian would you mind reviewing this PR? |
func TestRayClusterAutoscalerV2IdleTimeout(t *testing.T) { | ||
// Only test with the V2 Autoscaler | ||
name := "Create a RayCluster with autoscaler v2 enabled" | ||
tc := tests["Create a RayCluster with autoscaler v2 enabled"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tc := tests["Create a RayCluster with autoscaler v2 enabled"] | |
tc, ok := tests[name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make the test fail if !ok
namespace := test.NewTestNamespace() | ||
|
||
// Minimum Ray Version for custom idleTimeoutSeconds | ||
idleTimeoutMinRayVersion := "2.40.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a convention for stating the minimum requirement in tests, so you can just use the current version by GetRayVersion()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to update the test logic to:
- Deploy a detached actor on worker group 1 (
wg1
,idleTimeoutSeconds
: 10s) - Deploy a detached actor on worker group 2 (
wg2
,idleTimeoutSeconds
: 30s) - Terminate the detached actor on
wg1
, and make sure the worker Pod is terminated around 10s. - Terminate the detached actor on
wg2
, and make sure the worker Pod is terminated around 30s.
I will merge this PR directly. @rueian will open a follow up PR.
…toscaler (ray-project#2725)" This reverts commit d0e8b57.
Revert this PR. This test is pretty flaky. |
Are there any error outputs from the flaky test runs? When I run this locally I see it passing. I can resolve the comments in #2725 (review) and add a longer timeout in a new PR that I think would likely resolve the flakiness. |
Hi @ryanaoleary, here is the full log: You can search ![]() |
Why are these changes needed?
This PR adds a test case for custom
idleTimeoutSeconds
per worker group supported by the V2 Autoscaler starting in ray 2.40.0. The test case verifies thatidleTimeoutSeconds
for a worker group override the defaultidleTimeoutSeconds
set in theautoscalerOptions
by scaling up replicas of each type and then checking that they are terminated within the expected period.This PR was tested by running
go test -timeout 30m -v ./test/e2eautoscaler
with the nightly version of the KubeRay operator installed, since the change adding this field to the WorkerGroupSpec api is not yet part of a release.Related issue number
#2561
Checks