-
Notifications
You must be signed in to change notification settings - Fork 38
Description
Hi,
I noticed that when given VCFs as input to AnnotSV, in VCFsToBED
, it seems to convert them to temporary BED files named using the current timestamp:
AnnotSV/share/tcl/AnnotSV/AnnotSV-vcf.tcl
Line 345 in c5dff6a
set SV_BEDfile "$g_AnnotSV(outputDir)/[clock format [clock seconds] -format "%Y%m%d-%H%M%S"]_AnnotSV_inputSVfile.bed" |
I was wondering if the naming for this file could be slightly improved (for example, to include the PID of the current process), because this seems to have unintended consequences with file races when multiple instances of AnnotSV (e.g., which are running as part of batch jobs) are working within the same output directory. I myself was running AnnotSV as part of a larger pipeline using Snakemake and SLURM, and noticed that, despite having multiple separate inputs being processed, the temporary BED files clearly looked like they were being raced by multiple processes and had corrupted and duplicated entries all over the place, leading to bedtools being unable to process them and thus AnnotSV to error out entirely. Likely, what was happening was that multiple instances of AnnotSV were started at the same time, and thus decided on the same temporary filename.
If you'd like, I should be able to prepare a PR to address this issue--it seems to be a rather straightforward change, though I don't have much experience with TCL. I'd appreciate it if you could let me know what you think, though! Thanks.