Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blastn: Out of memory #64

Open
pieterprovoost opened this issue May 16, 2023 · 2 comments
Open

blastn: Out of memory #64

pieterprovoost opened this issue May 16, 2023 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@pieterprovoost
Copy link
Member

blastn on nt is running out of memory with around 10k sequences and 24 GB available.

nt database:

$ du -sh /home/ubuntu/data/databases/ncbi/nt/20211125
163G    /home/ubuntu/data/databases/ncbi/nt/20211125

sar -r output:

            kbmemfree   kbavail kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
11:28:39       225020  24244492   4741964     15.76     17276  23880088   4997360     16.61   4427976  23792828         8
11:28:42       220804  24244928   4741656     15.76     17284  23884604   4997360     16.61   4428048  23797492         0
11:28:45       242660  24244364   4742236     15.76     17292  23862160   4997360     16.61   4428036  23775084         8
11:28:48       229372  24244412   4742212     15.76     17152  23875612   4997360     16.61   4425680  23791044         8
11:28:51       249000  24244224   4742344     15.76     16200  23856812   4997360     16.61   4420276  23776756        20
11:28:54       221816  24243548   4743152     15.76     15948  23883436   4997360     16.61   4419192  23804400         8
11:28:57       295780  24310048   4676544     15.54     15656  23876332   4943728     16.43   4352876  23797612       120
11:29:00       235148  22456896   6529852     21.70     15168  22084360   7421172     24.66   6210808  22008208         0
11:29:03       256260  17935408  11051528     36.73     15168  17541572  13518620     44.93  10718804  17466892        12
11:29:06       240772  13255068  15731676     52.28     15176  12876904  19889196     66.10  15383632  12804904         0
11:29:09       226832   8438732  20548020     68.29     15040   8074636  26635672     88.52  20181256   8008252         8
11:29:12       223048   3515488  25471208     84.65     13508   3156772  33544160    111.47  25081504   3098024         0
11:29:15     26666456  26914980   2145676      7.13      1788     56084  38036908    126.40   1685644     45496        32
11:29:18     27011352  27265068   1792064      5.96      4164     61716   2325264      7.73   1689008     49300       104
11:29:21     27011368  27265144   1791912      5.95      4172     61844   2325264      7.73   1689072     49416       108
11:29:24     27010912  27264728   1792268      5.96      4172     61908   2313376      7.69   1689976     48804       108

journalctl -xb output:

May 16 11:29:14 lfw-ds001-i035 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-4094.scope,task=blastn,pid=1241549,uid=1>
May 16 11:29:14 lfw-ds001-i035 kernel: Out of memory: Killed process 1241549 (blastn) total-vm:182352216kB, anon-rss:26526796kB, file-rss:2164kB, shmem-rss:0kB, UID:1000 pgtables:355928kB oom_score_adj:0
@pieterprovoost
Copy link
Member Author

pieterprovoost commented May 16, 2023

This seems to be fixed by enabling swap, but we can also try using GNU parallel to split up the query file and process sequentially. Something like this (not tested):

cat results/eDNAexpeditions/runs/Trial_allSamples_COI_NotPaired/04-taxonomy/blca/MIDORI_UNIQ_GB246_CO1_unclassified-seqs_cutoff85.fna | \
parallel --block 100k --jobs 1 --recstart '>' \
--pipe blastn -query - -outfmt 6 -perc_identity 85 -db /home/ubuntu/data/databases/ncbi/nt/20211125/nt \
> results/eDNAexpeditions/runs/Trial_allSamples_COI_NotPaired/04-taxonomy/blast/MIDORI_UNIQ_GB246_CO1_unclassified_blast_results_cutoff85.tab

@pieterprovoost
Copy link
Member Author

@SSuominen1 I did some "quick" benchmarking:

chunks swapiness threads time (hours)
no swap disabled 1 task killed
500 kB 60 8 6:32
500 kB 0 1 7:24
500 kB 60 1 7:40
no 0 8 9:07
no 60 8 9:27
300 kB 0 1 11:43

So at least in our configuration it helps to split up in chunks as long as they are not too small. The updated command looks like this, I'll submit a PR:

cat results/eDNAexpeditions/runs/Trial_allSamples_COI_NotPaired/04-taxonomy/blca/MIDORI_UNIQ_GB246_CO1_unclassified-seqs_cutoff85.fna | \
parallel --block 500k --jobs 1 --recstart '>' \
--pipe blastn -query - -outfmt 6 -perc_identity 85 -db /home/ubuntu/data/databases/ncbi/nt/20211125/nt \
> results/eDNAexpeditions/runs/Trial_allSamples_COI_NotPaired/04-taxonomy/blast/MIDORI_UNIQ_GB246_CO1_unclassified_blast_results_cutoff85.tab

@pieterprovoost pieterprovoost added the bug Something isn't working label May 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants