Skip to content

Merge wait and timeout to one CLI option for verdi process {kill|play|pause} #6902

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

agoscinski
Copy link
Contributor

@agoscinski agoscinski commented Jun 2, 2025

The original design of wait and timeout was to distinguish between actions that immediately return and actions that scheduled. This mechanism was however never used and resulted in an misinterpretation in the force-kill PR #6793 introducing a bug fixed in PR #6870. The mechanism of wait and timeout was also never correctly implemented. In this PR we rectify the logic and simplify it by handling immediate and scheduled actions the same way.

My interpretation is that the original logic (seen in commit 8388018) is that it unwraps the future once, if it is a another future then it is a scheduled action otherwise an immediate one. Then 1b6ecb8 in plumpy introduced a double wrapping for play, pause and kill of the return value which I think is correct as these are scheduled actions. The problem appears in cd0d15c where on the aiida-core side in the first round unwrapping is always done till the actual (nonfuture) result by using unwrap_kiwi_future. This makes the usage of wait completely obsolete as we always get the final nonfuture result in the first step. Also the timeout was not passed to the result which made it a blocking action for scheduled tasked (the timeout is only applied on the first layer of future in futures.as_completed which is meaningless since the first layer is always a future and returns immediately).

This is fixed by simplifying the two step procedure, unwrap once and if future unwrap again, to one step, unwrap until nonfuture result gained with timeout. One can specify a timeout <= 0 to express that the action should not wait for a response while one can specify timeout==float('inf') (default value) to wait until a response has been received without a timeout.

Copy link

codecov bot commented Jun 4, 2025

Codecov Report

Attention: Patch coverage is 90.90909% with 2 lines in your changes missing coverage. Please review.

Project coverage is 78.62%. Comparing base (e768b70) to head (cfbc5bd).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/aiida/engine/processes/control.py 85.72% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6902      +/-   ##
==========================================
- Coverage   78.62%   78.62%   -0.00%     
==========================================
  Files         567      567              
  Lines       43127    43108      -19     
==========================================
- Hits        33903    33888      -15     
+ Misses       9224     9220       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

The original design of `wait` and `timeout` was to distinguish between
actions that immediately return and actions that scheduled. This
mechanism was however never used and resulted in an misinterpretation
in the force-kill PR aiidateam#6793 introducing a bug fixed in PR aiidateam#6870.
The mechanism of `wait` and `timeout` was also never correctly implemented.
In this PR we rectify the logic and simplify it by handling immediate
and scheduled actions the same way.

Related commits in aiida-core 8388018, cd0d15c and plumpy 1b6ecb8

One can specifiy a `timeout <= 0` to express that the action should not wait for
a response while one can specify `timeout==float('inf')` (default value) to wait
until a response has been received without a timeout.
@agoscinski agoscinski requested a review from unkcpz June 5, 2025 05:13
@agoscinski agoscinski marked this pull request as ready for review June 5, 2025 05:13
@agoscinski agoscinski changed the title Merge wait and timeout to one CLI option for verdi process {kill|play|pause} Merge wait and timeout to one CLI option for verdi process {kill|play|pause} Jun 5, 2025
timeout: t.Optional[float] = None,
) -> None:
"""Process a mapping of futures representing an action on an active process.

This function will echo the correct information strings based on the outcomes of the futures and the given verb
conjugations. You can optionally wait for any pending actions to be completed before the functions returns and use a
timeout to put a maximum wait time on the actions.
timeout to put a maximum wait time on the actions. TODO fix docstring
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't forget it

Copy link
Member

@unkcpz unkcpz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think after staring on this part 1 hour, I understand a bit more clear. The code you removed is a recursion call that more or less did the same thing as calling unwrap_kiwi_future and then call result(). I think you did it right, when I add the unwrap_kiwi_future to this function, it was just a workaround to get the wrapped future resolved because of LoopCommunicator put another layer of future to transfer the result over the threads. I admit I didn't pay attention to the after part which originally designed to get the future resolved.

EDIT: I still not sure it is correct. I overlooked the indentation on the code you removed. I think the behavior of calling unwrap.result() in every loop of process iter is different from using as_complete for all futures. What makes me confused was result = future.result() was there in from beginning, it should doing the same thing as using as_completed to wait futures as a bundle.

@agoscinski let's have a chat next week, I need fresh my head a bit, it hurts.

except communications.TimeoutError:
cancelled = unwrapped.cancel()
cancelled = future.cancel()
Copy link
Member

@unkcpz unkcpz Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct here, the future is different from unwrapped.
Let's image the results are warped in 3 layers of future object, to get the result of final result from future requires future.result().result().result() but after unwrap to a flat future which is unwrapped, it requires only unwrapped.result(). It is the same for the cancel().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I think I am wrong here, the cancellation is taken care inside unwrap_kiwi_future where doing the cancellation when the inner most future is claim to be cancelled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Clarify wait and timeout in CLI play, pause, kill commands
2 participants