-
Notifications
You must be signed in to change notification settings - Fork 52
Description
When runnning cooltools pileup using 1000 random regions from a single chromosome (chr1 or chr2), it extracts snips in about 3 minutes. However, if I take the first 500 random regions from the chr1 and chr2 samples, combine them, and then run the same cooltools pileup command, it runs for over 30 minutes (10x longer) without completing.
Here's an example of the command I'm using. The only difference is the regions I insert, which may be 1000 regions from chr1 exclusively (fast), 1000 regions from chr2 exclusively (fast), or 500 regions each from chr1 and chr2 (very slow).
srun cooltools pileup --out-format HDF5 --flank 6000 --features-format bed --out snips_1_samp.hdf5 --store-snips --nproc 12 --ignore-diags 0 --clr-weight-name RU inter.mcool::/resolutions/200 chr1.tsv
I would have expected the runtime to be a maximum of ~2 times slower when two chromosomes are combined, not 10x+ times slower. For now, I'll just run each chromosome separately and combine after the fact (not a big deal), but it might be worth addressing -- I almost didn't use cooltools pileup at first because I was getting an impression of it being so slow until I figured out this was the underlying issue.
Note -- the regions I'm using are each 1bp in size for this example.