How to wait for process to finish before starting the next #3623
-
Hello I have a problem which doesn't seem to be answered well by the scatter gather solution I've found in prior discussions. So I'm running trim_galore on raw fastq file which are then fed into bismark for alignment. I'd like to wait for the entirety of the trim_galore step to finish so that I don't hog all the resources by running alignment and trimming at the same time. This is the setup of my pipeline.
What I'd like ideally is for trim_galore to finish before allowing bismark to start on all the processed files. From what I see is that if I do a collect on the trim_galore.out, it does end up waiting for the process to finish but the output is just one big list that looks roughly like this: [file_id1, [file_id1_val_1.fq.gz, file_id1_val_2.fq.gz], file_id2, [file_id2_val_1.fq.gz, file_id2_val_2.fq.gz]] I'm not sure exactly how I can get this list to be read into the bismark_align process so that it submits 1 process for each file_id/val_(1,2) pair. I believe I want some sort of state dependency implemented simply but I'm not entirely sure how. Please let me know if you need further clarification and thank you for any help you can provide. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
If you collect with
Output:
|
Beta Was this translation helpful? Give feedback.
-
This worked perfectly in changing collect from a single emission to multiple emissions. Thank you! |
Beta Was this translation helpful? Give feedback.
-
@mribeirodantas you are a genius, thank you so much |
Beta Was this translation helpful? Give feedback.
If you collect with
flat: false
and use theflatMap
channel operator afterward you should get the waiting desired effect and at the same time have the channel in its original output structure. Check the example below:Output: