Description
SUMMARY
While running our workflows, we found out that some that are supposed to branch under some conditions don't execute until the expected last task on the second branch.
As we created a loop in the workflow, we don't know if it's a real bug or if we have improper use of orquesta workflow engine.
In the last case, some docs may be missing as there's no warning regarding loops on tasks in orquesta engine.
STACKSTORM VERSION
st2 --version
: st2 3.5.0, on Python 3.6.8
OS, environment, install method
Running on CentOS Linux release 7.6.1810 (Core) and installed manually (with rpm + dependencies) following installation docs.
Steps to reproduce the problem
Here are two simple workflows to reproduce the problem :
tester_bug.yaml
version: 1.0
description: workflow to reproduce bug with task loop and parallel branching
vars:
- nextstep: "step1"
tasks:
entrypoint:
action: core.noop
next:
- do: check_step
check_step:
action: core.noop
next:
- when: <% ctx(nextstep) = "step1" %>
publish:
- nextstep: "step2"
do:
- sleep_wf
- sleep_action
- when: <% ctx(nextstep) = "step2" %>
publish:
- nextstep: "step3"
do:
- sleep_action
- when: <% ctx(nextstep) = "step3" %>
publish:
- nextstep: "step4"
do:
- sleep_action
sleep_action:
action: core.local
input:
cmd: "sleep 15"
next:
- do: check_step
sleep_wf:
action: bull.utilities.sleep_wf
next:
- do: check_step
output:
- message: "toto"
sleep_wf.yaml
version: 1.0
description: workflow that just sleeps
tasks:
sleep_action:
action: core.local
input:
cmd: "sleep 42"
output:
- state: "OK"
And their associated metadatas :
mdt_tester_bug.yaml
description: workflow to reproduce bug with task loop and parallel branching
enabled: true
name: tester_bug
notify: {}
pack: toto
runner_type: orquesta
entry_point: "workflows/tester_bug.yaml"
mdt_sleep_wf.yaml
description: workflow that just sleeps
enabled: true
name: sleep_wf
notify: {}
pack: toto
runner_type: orquesta
entry_point: "workflows/sleep_wf.yaml"
Expected Results
StackStorm should return a result only when both branches in tester_bug.yaml
are finished.
Actual Results
Instead of returning when both branches are finished, StackStorm terminates workflow when one of them is finished and kills the other.
Example output of this behaviour :
st2 run toto.tester_bug
.........................
id: 67d2d71c806560c75c1bb55c
action.ref: toto.tester_bug
parameters: None
status: succeeded
start_timestamp: Thu, 13 Mar 2025 14:01:16 CET
end_timestamp: Thu, 13 Mar 2025 14:02:06 CET
log:
- status: requested
timestamp: '2025-03-13T13:01:16.346000Z'
- status: scheduled
timestamp: '2025-03-13T13:01:16.499000Z'
- status: running
timestamp: '2025-03-13T13:01:16.559000Z'
- status: succeeded
timestamp: '2025-03-13T13:02:05.915000Z'
result:
output:
message: toto
+-----------------------------+-------------------------+--------------+-------------------------+-------------------------------+
| id | status | task | action | start_timestamp |
+-----------------------------+-------------------------+--------------+-------------------------+-------------------------------+
| 67d2d71c2db78f675e32168f | succeeded (1s elapsed) | entrypoint | core.noop | Thu, 13 Mar 2025 14:01:16 CET |
| 67d2d71d2db78f675e32169f | succeeded (0s elapsed) | check_step | core.noop | Thu, 13 Mar 2025 14:01:17 CET |
| 67d2d71d2db78f675e3216af | succeeded (16s elapsed) | sleep_action | core.local | Thu, 13 Mar 2025 14:01:17 CET |
| + 67d2d71d2db78f675e3216b5 | succeeded (44s elapsed) | sleep_wf | toto.sleep_wf | Thu, 13 Mar 2025 14:01:17 CET |
| 67d2d71e2db78f675e3216c7 | succeeded (42s elapsed) | sleep_action | core.local | Thu, 13 Mar 2025 14:01:18 CET |
| 67d2d72d2db78f675e3216d8 | succeeded (0s elapsed) | check_step | core.noop | Thu, 13 Mar 2025 14:01:33 CET |
| 67d2d72d2db78f675e3216e8 | succeeded (16s elapsed) | sleep_action | core.local | Thu, 13 Mar 2025 14:01:33 CET |
| 67d2d73d2db78f675e3216f8 | succeeded (0s elapsed) | check_step | core.noop | Thu, 13 Mar 2025 14:01:49 CET |
| 67d2d73d2db78f675e321708 | succeeded (16s elapsed) | sleep_action | core.local | Thu, 13 Mar 2025 14:01:49 CET |
| 67d2d7492db78f675e321722 | succeeded (0s elapsed) | check_step | core.noop | Thu, 13 Mar 2025 14:02:01 CET |
| 67d2d7492db78f675e321732 | running (5s elapsed) | sleep_action | core.local | Thu, 13 Mar 2025 14:02:01 CET |
| 67d2d74d2db78f675e321742 | succeeded (0s elapsed) | check_step | core.noop | Thu, 13 Mar 2025 14:02:05 CET |
+-----------------------------+-------------------------+--------------+-------------------------+-------------------------------+
We see that a sleep action (not even the last one, only the one before the last) is running while StackStorm already returned a successful result).