-
Notifications
You must be signed in to change notification settings - Fork 88
Description
Summary
I'm confused on how to properly use dependencies. Let's say I have a workflow with 4 groups of steps (A, B, C, D) and each has multiple subtasks that can happen in parallel (A1, A2, ..., B1, B2, ...). Currently, I'm adding all the A steps using couler.map, then adding all the B steps with couler.map, etc. This correctly parallelizes across A1, A2, ..., but none of the B steps start until all the A steps have completed, despite the fact that I never explicitly set dependencies.
In this case, I want A and B to run in parallel, then C then D. Having this run sequentially as A, B, C, D is technically correct, but not ideally performant. However, given that I'm not setting dependencies, and they're still running sequentially, I feel like using the set_dependencies function wouldn't help. Also, when I tried to use the set_dependencies function, the couler code errored on parsing its own generated yaml due to duplicate anchor definitions. Would definitely like to see a more in-depth example than those currently present in the README which shows how to properly use set_dependencies in combination with functions like map.
Use Cases
Mostly explained above.
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.