-
Notifications
You must be signed in to change notification settings - Fork 371
Description
The current all-sky workflow (using "standard" settings) analysing a "chunk" of data will run many inspiral jobs. These need to be linked to stage_out processes (if the inspiral files are being stored) to copy the outputs into the running directory. (I assume PyGRB has a similar issue)
However, there are currently many thousands of stage_out jobs in an example workflow, which can often run quite slowly on a head node that is throttling local running processes (and is overwhelmed by many other users all the time).
There's no reason for there to be so many processes though. These could easily be say O(10) jobs copying many files across. This would mean that the stage_outs would likely not start until all processing is done (or close to done), but that's no different to reusing a previous workflow's output where 1 job might stage out all the inspiral files.
I think this is just a simple pegasus configuration tweak ... and it would be good that more people have a sense of what the pegasus configuration is, how it works, and what possibilities we have.