Skip to content

Conversation

aslonnie
Copy link
Collaborator

as cluster envs are anyscale specific concepts

@aslonnie aslonnie requested a review from a team as a code owner October 13, 2025 08:32
as cluster envs are anyscale specific concepts

Signed-off-by: Lonnie Liu <[email protected]>
@aslonnie aslonnie force-pushed the lonnie-251013-winfix8 branch from 6cfb3c8 to d4e91e6 Compare October 13, 2025 08:33
@aslonnie aslonnie added the go add ONLY when ready to merge, run all tests label Oct 13, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors Anyscale-specific cluster environment utility functions, moving them from ray_release/util.py to ray_release/anyscale_util.py. The move is logical and improves code organization. I've made a suggestion in create_cluster_env_from_image to improve its reusability and clarify logging.

Comment on lines +61 to +118
def create_cluster_env_from_image(
image: str,
test_name: str,
runtime_env: Dict[str, Any],
sdk: Optional["AnyscaleSDK"] = None,
cluster_env_id: Optional[str] = None,
cluster_env_name: Optional[str] = None,
) -> str:
anyscale_sdk = sdk or get_anyscale_sdk()
if not cluster_env_name:
cluster_env_name = get_custom_cluster_env_name(image, test_name)

# Find whether there is identical cluster env
paging_token = None
while not cluster_env_id:
result = anyscale_sdk.search_cluster_environments(
dict(
name=dict(equals=cluster_env_name),
paging=dict(count=50, paging_token=paging_token),
project_id=None,
)
)
paging_token = result.metadata.next_paging_token

for res in result.results:
if res.name == cluster_env_name:
cluster_env_id = res.id
logger.info(f"Cluster env already exists with ID " f"{cluster_env_id}")
break

if not paging_token or cluster_env_id:
break

if not cluster_env_id:
logger.info("Cluster env not found. Creating new one.")
try:
result = anyscale_sdk.create_byod_cluster_environment(
dict(
name=cluster_env_name,
config_json=dict(
docker_image=image,
ray_version="nightly",
env_vars=runtime_env,
),
)
)
cluster_env_id = result.result.id
except Exception as e:
logger.warning(
f"Got exception when trying to create cluster "
f"env: {e}. Sleeping for 10 seconds with jitter and then "
f"try again..."
)
raise ClusterEnvCreateError("Could not create cluster env.") from e

logger.info(f"Cluster env created with ID {cluster_env_id}")

return cluster_env_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

I have a couple of suggestions for create_cluster_env_from_image to improve its robustness and maintainability:

  1. The ray_version is hardcoded to "nightly". This limits the function's reusability for tests that might target specific Ray versions (e.g., release candidates). It would be more flexible to make ray_version a parameter with "nightly" as its default value.

  2. The warning message in the except block is misleading. It states that it will sleep and retry, but the function only raises an exception. The retry logic is handled by a decorator on the calling function. The message should be updated to avoid confusion.

def create_cluster_env_from_image(
    image: str,
    test_name: str,
    runtime_env: Dict[str, Any],
    sdk: Optional["AnyscaleSDK"] = None,
    cluster_env_id: Optional[str] = None,
    cluster_env_name: Optional[str] = None,
    ray_version: str = "nightly",
) -> str:
    anyscale_sdk = sdk or get_anyscale_sdk()
    if not cluster_env_name:
        cluster_env_name = get_custom_cluster_env_name(image, test_name)

    # Find whether there is identical cluster env
    paging_token = None
    while not cluster_env_id:
        result = anyscale_sdk.search_cluster_environments(
            dict(
                name=dict(equals=cluster_env_name),
                paging=dict(count=50, paging_token=paging_token),
                project_id=None,
            )
        )
        paging_token = result.metadata.next_paging_token

        for res in result.results:
            if res.name == cluster_env_name:
                cluster_env_id = res.id
                logger.info(f"Cluster env already exists with ID " f"{cluster_env_id}")
                break

        if not paging_token or cluster_env_id:
            break

    if not cluster_env_id:
        logger.info("Cluster env not found. Creating new one.")
        try:
            result = anyscale_sdk.create_byod_cluster_environment(
                dict(
                    name=cluster_env_name,
                    config_json=dict(
                        docker_image=image,
                        ray_version=ray_version,
                        env_vars=runtime_env,
                    ),
                )
            )
            cluster_env_id = result.result.id
        except Exception as e:
            logger.warning(f"Got exception when trying to create cluster env: {e}")
            raise ClusterEnvCreateError("Could not create cluster env.") from e

        logger.info(f"Cluster env created with ID {cluster_env_id}")

    return cluster_env_id

@ray-gardener ray-gardener bot added core Issues that should be addressed in Ray Core release-test release test labels Oct 13, 2025
@aslonnie aslonnie merged commit 1775318 into master Oct 13, 2025
7 checks passed
@aslonnie aslonnie deleted the lonnie-251013-winfix8 branch October 13, 2025 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Issues that should be addressed in Ray Core go add ONLY when ready to merge, run all tests release-test release test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant