Skip to content

Workflow finishes too early when parallel branches are running #6316

Open
@Dupangel

Description

@Dupangel

SUMMARY

While running our workflows, we found out that some that are supposed to branch under some conditions don't execute until the expected last task on the second branch.

As we created a loop in the workflow, we don't know if it's a real bug or if we have improper use of orquesta workflow engine.
In the last case, some docs may be missing as there's no warning regarding loops on tasks in orquesta engine.

STACKSTORM VERSION

st2 --version: st2 3.5.0, on Python 3.6.8

OS, environment, install method

Running on CentOS Linux release 7.6.1810 (Core) and installed manually (with rpm + dependencies) following installation docs.

Steps to reproduce the problem

Here are two simple workflows to reproduce the problem :

tester_bug.yaml

version: 1.0

description: workflow to reproduce bug with task loop and parallel branching

vars:
  - nextstep: "step1"

tasks:
  entrypoint:
    action: core.noop
    next:
      - do: check_step

  check_step:
    action: core.noop
    next:
      - when: <% ctx(nextstep) = "step1" %>
        publish:
          - nextstep: "step2"
        do:
          - sleep_wf
          - sleep_action
      - when: <% ctx(nextstep) = "step2" %>
        publish:
          - nextstep: "step3"
        do:
          - sleep_action
      - when: <% ctx(nextstep) = "step3" %>
        publish:
          - nextstep: "step4"
        do:
          - sleep_action

  sleep_action:
    action: core.local
    input:
      cmd: "sleep 15"
    next:
      - do: check_step

  sleep_wf:
    action: bull.utilities.sleep_wf
    next:
      - do: check_step


output:
  - message: "toto"

sleep_wf.yaml

version: 1.0

description: workflow that just sleeps

tasks:
  sleep_action:
    action: core.local
    input:
      cmd: "sleep 42"

output:
  - state: "OK"

And their associated metadatas :

mdt_tester_bug.yaml

description: workflow to reproduce bug with task loop and parallel branching
enabled: true
name: tester_bug
notify: {}
pack: toto
runner_type: orquesta
entry_point: "workflows/tester_bug.yaml"

mdt_sleep_wf.yaml

description: workflow that just sleeps
enabled: true
name: sleep_wf
notify: {}
pack: toto
runner_type: orquesta
entry_point: "workflows/sleep_wf.yaml"

Expected Results

StackStorm should return a result only when both branches in tester_bug.yaml are finished.

Actual Results

Instead of returning when both branches are finished, StackStorm terminates workflow when one of them is finished and kills the other.

Example output of this behaviour :

st2 run toto.tester_bug
.........................
id: 67d2d71c806560c75c1bb55c
action.ref: toto.tester_bug
parameters: None
status: succeeded
start_timestamp: Thu, 13 Mar 2025 14:01:16 CET
end_timestamp: Thu, 13 Mar 2025 14:02:06 CET
log:
  - status: requested
    timestamp: '2025-03-13T13:01:16.346000Z'
  - status: scheduled
    timestamp: '2025-03-13T13:01:16.499000Z'
  - status: running
    timestamp: '2025-03-13T13:01:16.559000Z'
  - status: succeeded
    timestamp: '2025-03-13T13:02:05.915000Z'
result:
  output:
    message: toto
+-----------------------------+-------------------------+--------------+-------------------------+-------------------------------+
| id                          | status                  | task         | action                  | start_timestamp               |
+-----------------------------+-------------------------+--------------+-------------------------+-------------------------------+
|   67d2d71c2db78f675e32168f  | succeeded (1s elapsed)  | entrypoint   | core.noop               | Thu, 13 Mar 2025 14:01:16 CET |
|   67d2d71d2db78f675e32169f  | succeeded (0s elapsed)  | check_step   | core.noop               | Thu, 13 Mar 2025 14:01:17 CET |
|   67d2d71d2db78f675e3216af  | succeeded (16s elapsed) | sleep_action | core.local              | Thu, 13 Mar 2025 14:01:17 CET |
| + 67d2d71d2db78f675e3216b5  | succeeded (44s elapsed) | sleep_wf     | toto.sleep_wf           | Thu, 13 Mar 2025 14:01:17 CET |
|    67d2d71e2db78f675e3216c7 | succeeded (42s elapsed) | sleep_action | core.local              | Thu, 13 Mar 2025 14:01:18 CET |
|   67d2d72d2db78f675e3216d8  | succeeded (0s elapsed)  | check_step   | core.noop               | Thu, 13 Mar 2025 14:01:33 CET |
|   67d2d72d2db78f675e3216e8  | succeeded (16s elapsed) | sleep_action | core.local              | Thu, 13 Mar 2025 14:01:33 CET |
|   67d2d73d2db78f675e3216f8  | succeeded (0s elapsed)  | check_step   | core.noop               | Thu, 13 Mar 2025 14:01:49 CET |
|   67d2d73d2db78f675e321708  | succeeded (16s elapsed) | sleep_action | core.local              | Thu, 13 Mar 2025 14:01:49 CET |
|   67d2d7492db78f675e321722  | succeeded (0s elapsed)  | check_step   | core.noop               | Thu, 13 Mar 2025 14:02:01 CET |
|   67d2d7492db78f675e321732  | running (5s elapsed)    | sleep_action | core.local              | Thu, 13 Mar 2025 14:02:01 CET |
|   67d2d74d2db78f675e321742  | succeeded (0s elapsed)  | check_step   | core.noop               | Thu, 13 Mar 2025 14:02:05 CET |
+-----------------------------+-------------------------+--------------+-------------------------+-------------------------------+

We see that a sleep action (not even the last one, only the one before the last) is running while StackStorm already returned a successful result).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions