Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding resource limits that can be selected from the UI #3202

Merged
merged 1 commit into from
Jan 12, 2024

Conversation

paloma-rebuelta
Copy link
Contributor

I have changed the backend to replace cpu and memory for cpu_request and memory_request, as well as adding cpu_limit and memory_limit. This has also been added to the UI.

These changes should fix the issue where the pipeline breaks when there are pre-defined argo resource limits that are lower than the requests, fixes #3168

milestone 4.0

I have tested this by adding two unit tests and building the code to run in a kubeflow server. I tested a pipeline with all the resource requests/limits combinations (including leaving it blank). I cannot run airflow pipelines in my setup but have also edited the code for these.

Screenshot 2023-12-20 at 19 44 43

@paloma-rebuelta
Copy link
Contributor Author

Copy link
Member

@kevin-bates kevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paloma-rebuelta - thank you for acting on this so quickly - its much appreciated! Overall, this looks great!

I had a general comment regarding the relationship between limits and requests and how to enforce conditions we'd like to impose.

elyra/pipeline/pipeline.py Show resolved Hide resolved
@paloma-rebuelta paloma-rebuelta force-pushed the add_resource_limits branch 2 times, most recently from bbbd018 to 693b25e Compare December 21, 2023 09:32
@shalberd
Copy link
Contributor

@romeokienzler @lresende maybe we should merge in Airflow 2.x related changes to processor airflow and airflow jinja2 template first before the changes here. @paloma-rebuelta in Airflow 2, a lot of changes happened and resource requests and limits are now specified a little differently, see my Airflow 2.x PR 3167

Copy link
Member

@kevin-bates kevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this introduces a migration issue that either needs to be handled (or avoided).

Also, can fractional values for CPUs be entered, despite the schema type being integer?

@@ -156,9 +156,9 @@
"mounted_volumes": [],
"filename": "producer.ipynb",
"runtime_image": "tensorflow/tensorflow:2.8.0",
"cpu": 1,
"cpu_request": 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See "migration comment".

@kevin-bates
Copy link
Member

Regarding the screen layout (thank you for including that), I find it more intuitive to have CPU request and limit on the same line, rather than CPU and Memory request, then CPU and Memory limit. I think keeping the "subject" on its own line would be better, and is also consistent relative to the CPU line (as its fields are on the same line as well). Thanks.

@paloma-rebuelta
Copy link
Contributor Author

Regarding the screen layout (thank you for including that), I find it more intuitive to have CPU request and limit on the same line, rather than CPU and Memory request, then CPU and Memory limit. I think keeping the "subject" on its own line would be better, and is also consistent relative to the CPU line (as its fields are on the same line as well). Thanks.

done :)
Screenshot 2023-12-21 at 20 09 43

@kevin-bates
Copy link
Member

@romeokienzler @lresende maybe we should merge in Airflow 2.x related changes to processor airflow and airflow jinja2 template first before the changes here. @paloma-rebuelta in Airflow 2, a lot of changes happened and resource requests and limits are now specified a little differently, see my Airflow 2.x PR 3167

@shalberd - what is the ETA of the Airflow 2 PR - its been open for some time now. That PR speaks of really wanting limit settings and the changes to add CPU/Memory requests are trivial. In addition, this PR still supports Airflow 1x while I can't determine if your PR replaces that support or not (although I suspect it does).

As a result, I think this PR should take priority (assuming we don't require a migration), set the limit handling in place against existing user installations (preferrably with the release), then release support for Airflow 2x. If 2x support is a replacement, that's another issue worth discussing (if it hasn't been discussed already).

@paloma-rebuelta paloma-rebuelta force-pushed the add_resource_limits branch 3 times, most recently from 8b32b00 to 9fa145a Compare December 22, 2023 18:59
Copy link
Member

@kevin-bates kevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this great addition @paloma-rebuelta!

We can address millicores (and values < 1 core) in another PR, which will likely require a pipeline migration, at which time we can add proper adjustments to the request-related resources if we choose.

@kevin-bates
Copy link
Member

@lresende, since this change adds a helpful feature (and is low risk IMO), I wonder if we should include it in 3.x and use Airflow 2 support as the trigger for 4.x. Are there other merged PRs that are only slated for 4.x? If we do decide to include this in 3.x we should probably adjust the changelog updates.

@shalberd
Copy link
Contributor

shalberd commented Jan 2, 2024

@kevin-bates @lresende Happy New Year. I agree that this PR here should take precedence and is low risk, as adding optional limits is something that is extremely useful no matter what target runtime and is backward compatible.
As for the Airflow PR: yes, after we have merged in this limits PR here, I will make the few modifications to the assembly of Airflow 2.x code, i.e. the limits fields in Airflow 2.x compatible syntax. My Airflow PR is intended not for backward compatibility, but for replacing Airflow 1.x support with Airflow 2.x support in version 4 of Elyra. I agree that we can make this limits PR here part of 3.x Elyra. Thanks again to @paloma-rebuelta for the great work on this here.

@paloma-rebuelta
Copy link
Contributor Author

paloma-rebuelta commented Jan 2, 2024

@kevin-bates @lresende Happy New Year. I agree that this PR here should take precedence and is low risk, as adding optional limits is something that is extremely useful no matter what target runtime and is backward compatible. As for the Airflow PR: yes, after we have merged in this limits PR here, I will make the few modifications to the assembly of Airflow 2.x code, i.e. the limits fields in Airflow 2.x compatible syntax. My Airflow PR is intended not for backward compatibility, but for replacing Airflow 1.x support with Airflow 2.x support in version 4 of Elyra. I agree that we can make this limits PR here part of 3.x Elyra. Thanks again to @paloma-rebuelta for the great work on this here.

@kevin-bates / @lresende Happy new year! I am happy to edit the milestone, is there a specific version of 3.x I should put? 3.15.1 or 3.16.0 maybe?

@shalberd
Copy link
Contributor

shalberd commented Jan 9, 2024

@paloma-rebuelta we will make this part of release 4.0 (confirmed by @lresende) , looks good. I'll have a look at the code one more time, too, and test it. Having @kevin-bates approved it and your extensive discussions already make this seem mergeable.

Adding Limit/request tests
Signed-off-by: Author Name [email protected]
Signed-off-by: Paloma Rebuelta <[email protected]>

change rows for UI

Signed-off-by: Paloma Rebuelta <[email protected]>

restoring original name for requests
@shalberd
Copy link
Contributor

alos wondering about the env var minus GUI approach vs. env var and GUI approach discussed here before with @kevin-bates #3168 (comment)

@kevin-bates
Copy link
Member

@shalberd - regarding:

we will make this part of release 4.0 (confirmed by @lresende) , looks good. I'll have a look at the code one more time, too, and test it. Having @kevin-bates approved it and your extensive discussions already make this seem mergeable.

I'd still prefer we include this in a 3.x release since its independent of AF 2 support. I guess I'd prefer to see @lresende comment on my comment above regarding where this should be merged before concluding its only in 4.0.

alos wondering about the env var minus GUI approach vs. env var and GUI approach discussed here before with @kevin-bates #3168 (comment)

I'm a big fan of envs as default values, but also feel that (for this application in particular) having a GUI presence is preferred. I believe the previous discussion assumed that the GUI work was not able to be done, so we'd start with envs, then, when a GUI is in place, use the previous env-based effort to represent defaults/preferences. Since @paloma-rebuelta will be receiving "extra credit" on this by actually introducing the GUI changes (thank you!), the env portion of things is a bit muted IMHO.

@shalberd
Copy link
Contributor

shalberd commented Jan 10, 2024

I believe the previous discussion assumed that the GUI work was not able to be done, so we'd start with envs ... the env portion of things is a bit muted IMHO.

Agreed, just wanted to hear your perspective on this. We had customers on Github issues here that mentioned they have a requirement for limits to be set. So the only added perspective of an env in this new GUI approach would be: set a limit by factor * request if, and only if, no value is set in a node for a limit, across all nodes.

So, I'd say it's muted, but not entirely gone. However, we could cover that with a separate enhancement ticket and pull request, as it is kind of an extension for a special customer case.

@lresende
Copy link
Member

@shalberd - regarding:

we will make this part of release 4.0 (confirmed by @lresende) , looks good. I'll have a look at the code one more time, too, and test it. Having @kevin-bates approved it and your extensive discussions already make this seem mergeable.

I'd still prefer we include this in a 3.x release since its independent of AF 2 support. I guess I'd prefer to see @lresende comment on my comment above regarding where this should be merged before concluding its only in 4.0.

I was suggesting we merge to the main branch and that I would push for the next release to be 4.0 with support for JupyterLab... if others are willing to maintain a 3.x branch I am open for it. At the moment I won't be able to manage 3.x and 4.x releases in parallel.

Feel free to merge this to main once you are happy with it @kevin-bates

@shalberd
Copy link
Contributor

shalberd commented Jan 11, 2024

@kevin-bates let's hold off merging for a bit until we clear up the questions related to node editor limits field placement and cascading style sheets, as well as the question as to why for a single file "Submit as pipeline" does not show the new fields.
I am trying out a little, too, but @paloma-rebuelta might be quicker to answer. We're almost there, thanks again, @paloma-rebuelta, the work on this PR is great.

all good, I forgot to do the make package-ui step, working nicely and as described

Bildschirmfoto 2024-01-11 um 17 40 29

the only cosmetic thing I do not like so much is that the content on the Submit File to pipeline dialog is now pretty crowded

Bildschirmfoto 2024-01-11 um 17 41 18

regardless of how wide or resized that popup / overlay window is

Bildschirmfoto 2024-01-11 um 17 47 29

basically, there is no vertical space between the field columns anymore because it has to fit in the fixed size width of the whole canvas. Compare to before

Bildschirmfoto 2024-01-10 um 16 09 54

@paloma-rebuelta @kevin-bates /lgtm

Copy link
Contributor

@shalberd shalberd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great job, working nicely, minor cosmetic GUI comment in my last comment under conversations, no big deal. Thank you so very much, this is so very valuable. Love using it already.

@kevin-bates please disregard my last comment, PR is good to merge

@kevin-bates
Copy link
Member

if others are willing to maintain a 3.x branch I am open for it.

No can do either. So this will first appear in 4.0. Thanks for the response @lresende

@lresende lresende merged commit 0dae82a into elyra-ai:main Jan 12, 2024
17 checks passed
@lresende lresende added this to the 4.0.0 milestone Jan 12, 2024
@paloma-rebuelta
Copy link
Contributor Author

Thanks @kevin-bates @shalberd and @lresende for getting this across 🥳

@shalberd
Copy link
Contributor

you're welcome; this has been a long-standing wish of many people in the community, so your effort on this is extremely appreciated by not just us, I am sure.

D3f0 pushed a commit to D3f0/elyra that referenced this pull request Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add limits for container resources like cpu. memory, GPU in addition to requests
4 participants