-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[release test] move cluster env utils to anyscale_util
#57669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
as cluster envs are anyscale specific concepts Signed-off-by: Lonnie Liu <[email protected]>
6cfb3c8
to
d4e91e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors Anyscale-specific cluster environment utility functions, moving them from ray_release/util.py
to ray_release/anyscale_util.py
. The move is logical and improves code organization. I've made a suggestion in create_cluster_env_from_image
to improve its reusability and clarify logging.
def create_cluster_env_from_image( | ||
image: str, | ||
test_name: str, | ||
runtime_env: Dict[str, Any], | ||
sdk: Optional["AnyscaleSDK"] = None, | ||
cluster_env_id: Optional[str] = None, | ||
cluster_env_name: Optional[str] = None, | ||
) -> str: | ||
anyscale_sdk = sdk or get_anyscale_sdk() | ||
if not cluster_env_name: | ||
cluster_env_name = get_custom_cluster_env_name(image, test_name) | ||
|
||
# Find whether there is identical cluster env | ||
paging_token = None | ||
while not cluster_env_id: | ||
result = anyscale_sdk.search_cluster_environments( | ||
dict( | ||
name=dict(equals=cluster_env_name), | ||
paging=dict(count=50, paging_token=paging_token), | ||
project_id=None, | ||
) | ||
) | ||
paging_token = result.metadata.next_paging_token | ||
|
||
for res in result.results: | ||
if res.name == cluster_env_name: | ||
cluster_env_id = res.id | ||
logger.info(f"Cluster env already exists with ID " f"{cluster_env_id}") | ||
break | ||
|
||
if not paging_token or cluster_env_id: | ||
break | ||
|
||
if not cluster_env_id: | ||
logger.info("Cluster env not found. Creating new one.") | ||
try: | ||
result = anyscale_sdk.create_byod_cluster_environment( | ||
dict( | ||
name=cluster_env_name, | ||
config_json=dict( | ||
docker_image=image, | ||
ray_version="nightly", | ||
env_vars=runtime_env, | ||
), | ||
) | ||
) | ||
cluster_env_id = result.result.id | ||
except Exception as e: | ||
logger.warning( | ||
f"Got exception when trying to create cluster " | ||
f"env: {e}. Sleeping for 10 seconds with jitter and then " | ||
f"try again..." | ||
) | ||
raise ClusterEnvCreateError("Could not create cluster env.") from e | ||
|
||
logger.info(f"Cluster env created with ID {cluster_env_id}") | ||
|
||
return cluster_env_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple of suggestions for create_cluster_env_from_image
to improve its robustness and maintainability:
-
The
ray_version
is hardcoded to"nightly"
. This limits the function's reusability for tests that might target specific Ray versions (e.g., release candidates). It would be more flexible to makeray_version
a parameter with"nightly"
as its default value. -
The warning message in the
except
block is misleading. It states that it will sleep and retry, but the function only raises an exception. The retry logic is handled by a decorator on the calling function. The message should be updated to avoid confusion.
def create_cluster_env_from_image(
image: str,
test_name: str,
runtime_env: Dict[str, Any],
sdk: Optional["AnyscaleSDK"] = None,
cluster_env_id: Optional[str] = None,
cluster_env_name: Optional[str] = None,
ray_version: str = "nightly",
) -> str:
anyscale_sdk = sdk or get_anyscale_sdk()
if not cluster_env_name:
cluster_env_name = get_custom_cluster_env_name(image, test_name)
# Find whether there is identical cluster env
paging_token = None
while not cluster_env_id:
result = anyscale_sdk.search_cluster_environments(
dict(
name=dict(equals=cluster_env_name),
paging=dict(count=50, paging_token=paging_token),
project_id=None,
)
)
paging_token = result.metadata.next_paging_token
for res in result.results:
if res.name == cluster_env_name:
cluster_env_id = res.id
logger.info(f"Cluster env already exists with ID " f"{cluster_env_id}")
break
if not paging_token or cluster_env_id:
break
if not cluster_env_id:
logger.info("Cluster env not found. Creating new one.")
try:
result = anyscale_sdk.create_byod_cluster_environment(
dict(
name=cluster_env_name,
config_json=dict(
docker_image=image,
ray_version=ray_version,
env_vars=runtime_env,
),
)
)
cluster_env_id = result.result.id
except Exception as e:
logger.warning(f"Got exception when trying to create cluster env: {e}")
raise ClusterEnvCreateError("Could not create cluster env.") from e
logger.info(f"Cluster env created with ID {cluster_env_id}")
return cluster_env_id
as cluster envs are anyscale specific concepts