[release test] move cluster env utils to `anyscale_util` #57669

aslonnie · 2025-10-13T08:32:06Z

as cluster envs are anyscale specific concepts

as cluster envs are anyscale specific concepts Signed-off-by: Lonnie Liu <[email protected]>

gemini-code-assist

Code Review

This pull request refactors Anyscale-specific cluster environment utility functions, moving them from ray_release/util.py to ray_release/anyscale_util.py. The move is logical and improves code organization. I've made a suggestion in create_cluster_env_from_image to improve its reusability and clarify logging.

gemini-code-assist · 2025-10-13T08:33:45Z

release/ray_release/anyscale_util.py

+def create_cluster_env_from_image(
+    image: str,
+    test_name: str,
+    runtime_env: Dict[str, Any],
+    sdk: Optional["AnyscaleSDK"] = None,
+    cluster_env_id: Optional[str] = None,
+    cluster_env_name: Optional[str] = None,
+) -> str:
+    anyscale_sdk = sdk or get_anyscale_sdk()
+    if not cluster_env_name:
+        cluster_env_name = get_custom_cluster_env_name(image, test_name)
+
+    # Find whether there is identical cluster env
+    paging_token = None
+    while not cluster_env_id:
+        result = anyscale_sdk.search_cluster_environments(
+            dict(
+                name=dict(equals=cluster_env_name),
+                paging=dict(count=50, paging_token=paging_token),
+                project_id=None,
+            )
+        )
+        paging_token = result.metadata.next_paging_token
+
+        for res in result.results:
+            if res.name == cluster_env_name:
+                cluster_env_id = res.id
+                logger.info(f"Cluster env already exists with ID " f"{cluster_env_id}")
+                break
+
+        if not paging_token or cluster_env_id:
+            break
+
+    if not cluster_env_id:
+        logger.info("Cluster env not found. Creating new one.")
+        try:
+            result = anyscale_sdk.create_byod_cluster_environment(
+                dict(
+                    name=cluster_env_name,
+                    config_json=dict(
+                        docker_image=image,
+                        ray_version="nightly",
+                        env_vars=runtime_env,
+                    ),
+                )
+            )
+            cluster_env_id = result.result.id
+        except Exception as e:
+            logger.warning(
+                f"Got exception when trying to create cluster "
+                f"env: {e}. Sleeping for 10 seconds with jitter and then "
+                f"try again..."
+            )
+            raise ClusterEnvCreateError("Could not create cluster env.") from e
+
+        logger.info(f"Cluster env created with ID {cluster_env_id}")
+
+    return cluster_env_id


I have a couple of suggestions for create_cluster_env_from_image to improve its robustness and maintainability:

The ray_version is hardcoded to "nightly". This limits the function's reusability for tests that might target specific Ray versions (e.g., release candidates). It would be more flexible to make ray_version a parameter with "nightly" as its default value.

The warning message in the except block is misleading. It states that it will sleep and retry, but the function only raises an exception. The retry logic is handled by a decorator on the calling function. The message should be updated to avoid confusion.

def create_cluster_env_from_image( image: str, test_name: str, runtime_env: Dict[str, Any], sdk: Optional["AnyscaleSDK"] = None, cluster_env_id: Optional[str] = None, cluster_env_name: Optional[str] = None, ray_version: str = "nightly", ) -> str: anyscale_sdk = sdk or get_anyscale_sdk() if not cluster_env_name: cluster_env_name = get_custom_cluster_env_name(image, test_name) # Find whether there is identical cluster env paging_token = None while not cluster_env_id: result = anyscale_sdk.search_cluster_environments( dict( name=dict(equals=cluster_env_name), paging=dict(count=50, paging_token=paging_token), project_id=None, ) ) paging_token = result.metadata.next_paging_token for res in result.results: if res.name == cluster_env_name: cluster_env_id = res.id logger.info(f"Cluster env already exists with ID " f"{cluster_env_id}") break if not paging_token or cluster_env_id: break if not cluster_env_id: logger.info("Cluster env not found. Creating new one.") try: result = anyscale_sdk.create_byod_cluster_environment( dict( name=cluster_env_name, config_json=dict( docker_image=image, ray_version=ray_version, env_vars=runtime_env, ), ) ) cluster_env_id = result.result.id except Exception as e: logger.warning(f"Got exception when trying to create cluster env: {e}") raise ClusterEnvCreateError("Could not create cluster env.") from e logger.info(f"Cluster env created with ID {cluster_env_id}") return cluster_env_id

aslonnie requested a review from a team as a code owner October 13, 2025 08:32

[release test] move cluster env utils to anyscale_util

d4e91e6

as cluster envs are anyscale specific concepts Signed-off-by: Lonnie Liu <[email protected]>

aslonnie force-pushed the lonnie-251013-winfix8 branch from 6cfb3c8 to d4e91e6 Compare October 13, 2025 08:33

aslonnie added the go add ONLY when ready to merge, run all tests label Oct 13, 2025

gemini-code-assist bot reviewed Oct 13, 2025

View reviewed changes

ray-gardener bot added core Issues that should be addressed in Ray Core release-test release test labels Oct 13, 2025

aslonnie merged commit 1775318 into master Oct 13, 2025
7 checks passed

aslonnie deleted the lonnie-251013-winfix8 branch October 13, 2025 14:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[release test] move cluster env utils to `anyscale_util` #57669

[release test] move cluster env utils to `anyscale_util` #57669

aslonnie commented Oct 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[release test] move cluster env utils to anyscale_util #57669

[release test] move cluster env utils to anyscale_util #57669

Conversation

aslonnie commented Oct 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[release test] move cluster env utils to `anyscale_util` #57669

[release test] move cluster env utils to `anyscale_util` #57669