-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding support for Pathways proxy #690
base: main
Are you sure you want to change the base?
Conversation
pyproject.toml
Outdated
@@ -92,6 +92,7 @@ gcp = [ | |||
"google-cloud-build==3.24.1", | |||
"ml_goodput_measurement==0.0.2", | |||
"pyOpenSSL>=22.1.0", # compat with cryptography version. | |||
"pathwaysutils@git+https://github.com/google/pathways-utils", # for JAX+Pathways single-controller accelerator coordinator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned about depending a github repo directly without specifying a specific version, as any changes to the repo will be automatically reflected. Does pathways-utils
have versioned releases that can be installed via pip install
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback @ruomingp ! The short-term roadmap for the pathways-utils library doesn't account for a Pypi release. Adding a tag to the dependency so we can control which version gets installed with Axlearn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jesus-orozco , is there an ETA for a pypi release? IIRC, introducing git hashes to pyproject causes issues for our own pypi release. We prefer to avoid this if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm following up with the product team to get a better picture of the roadmap. At the moment Github is the only path forward to install the package, unless we bake the whl directly into the docker image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@markblee Pypi release ETA is Oct 18
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@markblee Pathwaysutils was released on Pypi this week. We have tested and integrated its latest version into this PR.
…ific tagged version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
pyproject.toml
Outdated
@@ -92,6 +92,7 @@ gcp = [ | |||
"google-cloud-build==3.24.1", | |||
"ml_goodput_measurement==0.0.2", | |||
"pyOpenSSL>=22.1.0", # compat with cryptography version. | |||
"pathwaysutils@git+https://github.com/google/pathways-utils", # for JAX+Pathways single-controller accelerator coordinator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jesus-orozco , is there an ETA for a pypi release? IIRC, introducing git hashes to pyproject causes issues for our own pypi release. We prefer to avoid this if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
axlearn/cloud/gcp/job.py
Outdated
flags.DEFINE_boolean( | ||
"use_pathways", False, "Wether the workload is pathways-enabled.", **common_kwargs | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a boolean flag, could we define a list flag specifying the additional modules to import, potentially including "pathwaysutils"?
See https://github.com/apple/axlearn/blob/main/docs/ml_api_style.md#avoid-booleans-in-apis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks this makes sense. Refactored the flag and using dynamic import instead.
axlearn/common/utils_spmd.py
Outdated
jax.distributed.initialize(**init_kwargs) | ||
_jax_distributed_initialized = True | ||
if jax_backend != "proxy": | ||
jax.distributed.initialize(**init_kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So using pathways does not require jax.distributed.initialize
? Please add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebased and added comment for pathways
axlearn/cloud/gcp/job.py
Outdated
@@ -457,6 +459,9 @@ def define_flags(cls, fv: flags.FlagValues): | |||
"not all TPU types support this flag.", | |||
**common_kwargs, | |||
) | |||
flags.DEFINE_list( | |||
"import_pathways", [], "Modules to enable pathways proxy.", **common_kwargs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here and below.
Enable JAX+Pathways single-controller architecture for coordination of accelerators.