-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feat: Add pipeline run parallelism config #12442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat: Add pipeline run parallelism config #12442
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Hi @sduvvuri1603. Thanks for your PR. I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
99f2fc8 to
d34a1b2
Compare
|
/retest |
82756e1 to
60a35d8
Compare
|
@sduvvuri1603: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/ok-to-test |
|
/retest |
| subjects: | ||
| - kind: ServiceAccount | ||
| name: ml-pipeline | ||
| namespace: kubeflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this intentional to hardcode the namespace, what if I deploy with a different namespace, is this getting replaced/overriden somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it was not intentional.
Thanks for pointing that out! I’ll change this binding to use a namespace variable($(kfp-namespace)) so it follows whatever namespace users deploy into.
nsingla
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a pipeline in test_data/valid/*compiled/essential that overrides this value.
Signed-off-by: sduvvuri1603 <[email protected]>
Signed-off-by: sduvvuri1603 <[email protected]>
60a35d8 to
39fb3dd
Compare
Signed-off-by: sduvvuri1603 <[email protected]>
39fb3dd to
c587b03
Compare
Signed-off-by: sduvvuri1603 <[email protected]>
Signed-off-by: sduvvuri1603 <[email protected]>
|
|
||
| Consult the [Python SDK reference docs](https://kubeflow-pipelines.readthedocs.io/en/stable/) when writing pipelines using the Python SDK. | ||
|
|
||
| > New in master: `dsl.PipelineConfig` now accepts an optional `pipeline_run_parallelism` integer to cap concurrent task execution for a run. The backend stores the requested limit in a shared ConfigMap and surfaces it to Argo Workflows via `spec.parallelism`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be more appropriate to add this entry to the CHANGELOG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but will this be a part of a new section called "Unreleased Features" ? because I only see version release details in the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the PR will be included here as part of the release process. @mprahl , could you confirm if that’s correct?
Signed-off-by: Sruthi Duvvuri <[email protected]>
Signed-off-by: sduvvuri1603 <[email protected]>
Signed-off-by: sduvvuri1603 <[email protected]>
|
|
||
| @pipeline_run_parallelism.setter | ||
| def pipeline_run_parallelism(self, value: Optional[int]) -> None: # pylint: disable=attribute-defined-outside-init | ||
| if value is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this required given https://github.com/kubeflow/pipelines/pull/12442/files#diff-631f096829954a5226e87a37347ab04d3e551465fdf0371b37fd9f9363c0c156R120 will set it to None if value is None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's right about the serialization part. But we need this guard to prevent a crash during initialization. Since init passes None by default, removing this check would cause it to hit the isinstance line and fail immediately. This just ensures we can safely create the object with no value set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess then you can just add 1 if statment:
if value:
if not isinstance(value, int):
raise ValueError(
'pipeline_run_parallelism must be an integer if specified.')
if value <= 0:
raise ValueError(
'pipeline_run_parallelism must be a positive integer.')
self._pipeline_run_parallelism = value
| name="pipeline-with-workspace", | ||
| description="A pipeline that demonstrates workspace functionality", | ||
| pipeline_config=dsl.PipelineConfig( | ||
| pipeline_run_parallelism=3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we actually set this to None here if we have an explicit test to test +ve values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, That makes sense. Will set this to none
|
@sduvvuri1603 can you please add what this config is suppose to do, to the PR description? and a section about how you;ve validated the functionality. |
|
|
||
| @dsl.pipeline( | ||
| name='pipeline-with-run-parallelism', | ||
| pipeline_config=dsl.PipelineConfig(pipeline_run_parallelism=7), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't 7 too high when the number of tasks in this pipeline is just 1? May be you should add more components to it or add a parallelFor loop and iterate over > pipeline_run_parallelism constants, so that we can validate that the config actually works.
Also what validation logic did you add to confirm the number of tasks created for a pipeline with this config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This specific test case is part of the SDK compilation suite to verify that the pipeline_run_parallelism field is correctly serialized from the Python SDK into the compiled YAML's PlatformSpec. It relies on the 'Golden File' comparison for validation here (ensuring the YAML contains pipelineRunParallelism: 7 correctly populated)
(It is not related to actual runtime limit covered by the backend integration tests where we submit these workflows to Argo is my understanding) Pls Lmk if this is correct!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any pipeline yaml file in this directory will be part of the end to end tests, so yes, the workflow will get submitted to argo.
Signed-off-by: sduvvuri1603 <[email protected]>
Summary
pipeline_run_parallelismtodsl.PipelineConfig(proto, Go, Python) and thread it through the compiler/runtime so the value lands in the workflow bundle and Kubernetes platform speckfp-pipeline-configand apply it to Argospec.parallelism; extend backend tests and regenerate proto goldensWhat
pipeline_run_parallelismdoespipeline_run_parallelismlets authors cap the number of tasks that can execute concurrently in a pipeline run. When set, the value is stored inkfp-pipeline-configand the compiler writes it to the workflow manifest (spec.parallelism) so Argo enforces the limit.Validation
pipeline_with_run_parallelism.yaml) showspipelineRunParallelismin the compiled specspec.parallelismis populatedTestCreateRun_PipelineRunParallelismConfigMapexercises the config map persistence path