-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
quota: improve performance of quota updater #200
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #200 +/- ##
==========================================
+ Coverage 73.55% 74.32% +0.76%
==========================================
Files 7 7
Lines 832 892 +60
==========================================
+ Hits 612 663 +51
- Misses 220 229 +9
|
Small note for this PR: this PR maintains the current approach of having one commit per workflow/user, but this will not scale with an even bigger number of workflows and/or users.
|
Some thoughts on the possible future options:
If we make only one commit at the end of the session, and we would have to update many rows, and this last commit would fail for a reason-or-another, how would the rollback situation look? Might not be so interesting as the other two approaches?
This would be definitely good to do longer term, i.e. process only "modified" workspaces, and having an option to process "everything" once in a blue moon to catch up any possible troubles. However for operations such as
Yes, it might be good to have another queue where we register "please process quota for this workflow" kind of requests and have another daemon consumer process running permanently in the background that would consume these messages and doing the necessary quota updates as they arrive. In this way we would also address the previous item about doing only those updates that are actually needed. (Even though there may still be corner cases such a a user leaving a notebook terminal opened for many days without closing.) This requires quite some development though, but we might want to ticketise the ideas for the future! |
reana_db/utils.py
Outdated
# This makes `Session.commit()` much faster | ||
for workflow in workflows: | ||
Session.expunge(workflow) | ||
timer = Timer("Workflow disk", total=len(workflows)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cosmetics: the log messages currently look like:
2023-08-25 14:22:22,682 | root | MainThread | INFO | Workflow disk progress: 8/8 elapsed: 0.066s est.total: 0.066s per event: 0.008s
2023-08-25 14:22:22,703 | root | MainThread | INFO | User disk progress: 3/3 elapsed: 0.019s est.total: 0.019s per event: 0.006s
Users disk quota usage updated successfully.
2023-08-25 14:22:22,721 | root | MainThread | INFO | Workflow CPU progress: 8/8 elapsed: 0.015s est.total: 0.015s per event: 0.002s
2023-08-25 14:22:22,738 | root | MainThread | INFO | User CPU progress: 3/3 elapsed: 0.016s est.total: 0.016s per event: 0.005s
Users cpu quota usage updated successfully.
We may want to improve the INFO message (it's not "CPU progress"!) by using something like:
- "Workflow disk quota usage update progress: 8/8"
and clarify the final message:
- "Disk quota usage updated successfully for all users and workflows."
3e11213
to
6084fa5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works well locally 👍
Improve the performance of the quota updater by "expunging" all workflows before starting to update quotas, as the Workflow table does not need to be modified, thus making commits much faster. Also refactor the various update functions to make their behaviour consistent with each other. Partially addresses reanahub#193
6084fa5
to
8cc26f7
Compare
quota: improve performance of quota updater
Improve the performance of the quota updater by "expunging" all
workflows before starting to update quotas, as the Workflow table does
not need to be modified, thus making commits much faster. Also refactor
the various update functions to make their behaviour consistent with
each other.
Partially addresses #193