-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
I have downloaded .gff and .fna files from NCBI for four (meta)genomes:
genomes.tar.gz
One of these genomes I use as a query (CP104013) and the other 3 as targets:
spacedust createsetdb genomes/query/CP104013.fna spacedust/querySetDB tmpFolder --gff-dir genomes/queries.txt --gff-type CDS
spacedust createsetdb genomes/target/*.fna spacedust/targetSetDB tmpFolder --gff-dir genomes/targets.txt --gff-type CDS
spacedust clustersearch spacedust/querySetDB spacedust/targetSetDB spacedust/result.tsv tmpFolder
The sequence databases get created successfully, but during the clustersearch I get the following error:
Error: clusterhits failed
Invalid query lookup record
It is not clear to me why my query lookup record is invalid.
My environment
Red hat 9.4
Running the AVX2 statically compiled spacedust (29-Mar-2025 20:07)
Log files
Output log:
#### Run Space Dust
Create directory /data1/schwabs/python/AF_proteome-Loki_ossiferum/tmpFolder
createsetdb /data1/schwabs/python/AF_proteome-Loki_ossiferum/genomes/query/CP104013.fna /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/querySetDB /data1/schwabs/python/AF_proteome-Loki_ossiferum/tmpFolder --gff-dir /data1/schwabs/python/AF_proteome-Loki_ossiferum/genomes/queries.txt --gff-type CDS
MMseqs Version: 5358214da8764737aa01af485b682729bb8d3ace
Database type 0
Shuffle input database false
Createdb mode 0
Write lookup file 1
Offset of numeric ids 0
Compressed 0
Verbosity 3
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Create lookup 0
Threads 24
Add orf stop false
GFF type CDS
Statistics to be computed
Tsv false
File Inclusion Regex .*
File Exclusion Regex ^$
gff dir file /data1/schwabs/python/AF_proteome-Loki_ossiferum/genomes/queries.txt
createdb /data1/schwabs/python/AF_proteome-Loki_ossiferum/genomes/query/CP104013.fna /data1/schwabs/python/AF_proteome-Loki_ossiferum/tmpFolder/14160228955877007021/seqDB --dbtype 0 --shuffle 0 --createdb-mode 0 --write-lookup 1 --id-offset 0 --compressed 0 -v 3
Converting sequences
[
Time for merging to seqDB_h: 0h 0m 0s 5ms
Time for merging to seqDB: 0h 0m 0s 5ms
Database type: Nucleotide
Time for processing: 0h 0m 0s 93ms
Input DB type is Nucleotide.
gff2db /data1/schwabs/python/AF_proteome-Loki_ossiferum/genomes/query/CP104013.gff3 /data1/schwabs/python/AF_proteome-Loki_ossiferum/tmpFolder/14160228955877007021/seqDB /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/querySetDB_nucl --gff-type CDS --id-offset 0 --threads 24 -v 3
[=================================================================] 1 0s 279ms
Time for merging to querySetDB_nucl.lookup: 0h 0m 0s 283ms
Time for merging to querySetDB_nucl_h: 0h 0m 0s 83ms
Time for merging to querySetDB_nucl: 0h 0m 0s 90ms
Found these feature types and counts:
- CDS: 5119
Time for processing: 0h 0m 1s 270ms
translatenucs /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/querySetDB_nucl /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/querySetDB --translation-table 1 --add-orf-stop 0 -v 3 --compressed 0 --threads 24
[=================================================================] 5.12K 0s 20ms
Time for merging to querySetDB: 0h 0m 0s 112ms
Time for processing: 0h 0m 0s 309ms
tsv2db /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/querySetDB_member_to_set.tsv /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/querySetDB_member_to_set --output-dbtype 5
Output database type: Alignment
Time for merging to querySetDB_member_to_set: 0h 0m 0s 5ms
Time for processing: 0h 0m 0s 14ms
tsv2db /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/querySetDB_set_to_member.tsv /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/querySetDB_set_to_member --output-dbtype 5
Output database type: Alignment
Time for merging to querySetDB_set_to_member: 0h 0m 0s 5ms
Time for processing: 0h 0m 0s 14ms
result2stats /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/querySetDB /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/querySetDB /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/querySetDB_set_to_member /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/querySetDB_set_size --stat linecount --tsv 0 --compressed 0 --threads 24 -v 3
[=================================================================] 1 0s 23ms
Time for merging to querySetDB_set_size: 0h 0m 0s 98ms
Time for processing: 0h 0m 0s 209ms
createsetdb /data1/schwabs/python/AF_proteome-Loki_ossiferum/genomes/target/CP042905.fna /data1/schwabs/python/AF_proteome-Loki_ossiferum/genomes/target/CP084167.fna /data1/schwabs/python/AF_proteome-Loki_ossiferum/genomes/target/CP091871.fna /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/targetSetDB /data1/schwabs/python/AF_proteome-Loki_ossiferum/tmpFolder --gff-dir /data1/schwabs/python/AF_proteome-Loki_ossiferum/genomes/targets.txt --gff-type CDS
MMseqs Version: 5358214da8764737aa01af485b682729bb8d3ace
Database type 0
Shuffle input database false
Createdb mode 0
Write lookup file 1
Offset of numeric ids 0
Compressed 0
Verbosity 3
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Create lookup 0
Threads 24
Add orf stop false
GFF type CDS
Statistics to be computed
Tsv false
File Inclusion Regex .*
File Exclusion Regex ^$
gff dir file /data1/schwabs/python/AF_proteome-Loki_ossiferum/genomes/targets.txt
createdb /data1/schwabs/python/AF_proteome-Loki_ossiferum/genomes/target/CP042905.fna /data1/schwabs/python/AF_proteome-Loki_ossiferum/genomes/target/CP084167.fna /data1/schwabs/python/AF_proteome-Loki_ossiferum/genomes/target/CP091871.fna /data1/schwabs/python/AF_proteome-Loki_ossiferum/tmpFolder/18056230238174401032/seqDB --dbtype 0 --shuffle 0 --createdb-mode 0 --write-lookup 1 --id-offset 0 --compressed 0 -v 3
Converting sequences
[
Time for merging to seqDB_h: 0h 0m 0s 4ms
Time for merging to seqDB: 0h 0m 0s 3ms
Database type: Nucleotide
Time for processing: 0h 0m 0s 92ms
Input DB type is Nucleotide.
gff2db /data1/schwabs/python/AF_proteome-Loki_ossiferum/genomes/target/CP084167.gff3 /data1/schwabs/python/AF_proteome-Loki_ossiferum/genomes/target/CP091871.gff3 /data1/schwabs/python/AF_proteome-Loki_ossiferum/genomes/target/CP042905.gff3 /data1/schwabs/python/AF_proteome-Loki_ossiferum/tmpFolder/18056230238174401032/seqDB /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/targetSetDB_nucl --gff-type CDS --id-offset 0 --threads 24 -v 3
=================================================================] [3 0s 45ms
Time for merging to targetSetDB_nucl.lookup: 0h 0m 0s 176ms
Time for merging to targetSetDB_nucl_h: 0h 0m 0s 185ms
Time for merging to targetSetDB_nucl: 0h 0m 0s 212ms
Found these feature types and counts:
- CDS: 8310
Time for processing: 0h 0m 1s 95ms
translatenucs /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/targetSetDB_nucl /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/targetSetDB --translation-table 1 --add-orf-stop 0 -v 3 --compressed 0 --threads 24
[=======================Nucleotide sequence entry 541 length (1163) is not divisible by three. Adjust length to (length=1161).
===Nucleotide sequence entry 834 length (1013) is not divisible by three. Adjust length to (length=1011).
===Nucleotide sequence entry 1401 length (386) is not divisible by three. Adjust length to (length=384).
========Nucleotide sequence entry 652 length (554) is not divisible by three. Adjust length to (length=552).
============================] 8.31K 0s 48ms
Time for merging to targetSetDB: 0h 0m 0s 116ms
Time for processing: 0h 0m 0s 339ms
tsv2db /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/targetSetDB_member_to_set.tsv /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/targetSetDB_member_to_set --output-dbtype 5
Output database type: Alignment
Time for merging to targetSetDB_member_to_set: 0h 0m 0s 6ms
Time for processing: 0h 0m 0s 17ms
tsv2db /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/targetSetDB_set_to_member.tsv /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/targetSetDB_set_to_member --output-dbtype 5
Output database type: Alignment
Time for merging to targetSetDB_set_to_member: 0h 0m 0s 6ms
Time for processing: 0h 0m 0s 15ms
result2stats /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/targetSetDB /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/targetSetDB /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/targetSetDB_set_to_member /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/targetSetDB_set_size --stat linecount --tsv 0 --compressed 0 --threads 24 -v 3
[=================================================================] 3 0s 25ms
Time for merging to targetSetDB_set_size: 0h 0m 0s 102ms
Time for processing: 0h 0m 0s 239ms
clustersearch /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/querySetDB /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/targetSetDB /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/result.tsv /data1/schwabs/python/AF_proteome-Loki_ossiferum/tmpFolder
MMseqs Version: 5358214da8764737aa01af485b682729bb8d3ace
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Add backtrace true
Alignment mode 2
Alignment mode 0
Allow wrapped scoring false
E-value threshold 10
Seq. id. threshold 0
Min alignment length 30
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 0.8
Coverage mode 2
Max sequence length 65535
Compositional bias 1
Compositional bias 1
Max reject 2147483647
Max accept 2147483647
Include identical seq. id. false
Preload mode 0
Pseudo count a substitution:1.100,context:1.400
Pseudo count b substitution:4.100,context:5.800
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Correlation score weight 0
Gap open cost aa:11,nucl:5
Gap extension cost aa:1,nucl:2
Zdrop 40
Threads 24
Compressed 0
Verbosity 3
Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out
Sensitivity 5.7
k-mer length 0
k-score seq:2147483647,prof:2147483647
Alphabet size aa:21,nucl:5
Max results per query 300
Split database 0
Split mode 2
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 0
Minimum diagonal score 15
Selected taxa
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Mask profile 1
Profile E-value threshold 0.001
Global sequence weighting false
Allow deletions false
Filter MSA 1
Use filter only at N seqs 0
Maximum seq. id. threshold 0.9
Minimum seq. id. 0.0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Pseudo count mode 0
Gap pseudo count 10
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Add orf stop false
Overlap between sequences 0
Sequence split mode 1
Header split mode 0
Chain overlapping alignments 0
Merge query 1
Search type 0
Search iterations 1
Start sensitivity 4
Search steps 1
Exhaustive search mode false
Filter results during exhaustive search 0
Strand selection 1
LCA search mode false
Disk space limit 0
MPI runner
Force restart with latest tmp false
Remove temporary files false
Use simple best hit true
Include sub-optimal hits with factor 0
Alpha 1
Aggregation mode 0
Filter self match false
Multihit P-value cutoff 0.01
Clustering and Ordering P-value cutoff 0.01
Maximum gene gaps 3
Minimal cluster size 2
Cluster weighting factor false
Database output true
Cluster search against profiles false
Cluster Search Mode 0
Path to Foldseek /data1/schwabs/modules/foldseek
Create directory /data1/schwabs/python/AF_proteome-Loki_ossiferum/tmpFolder/7822136860474669941/search
search /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/querySetDB /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/targetSetDB /data1/schwabs/python/AF_proteome-Loki_ossiferum/tmpFolder/7822136860474669941/result /data1/schwabs/python/AF_proteome-Loki_ossiferum/tmpFolder/7822136860474669941/search --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 1 --alignment-mode 2 --alignment-output-mode 0 --wrapped-scoring 0 -e 10 --min-seq-id 0 --min-aln-len 30 --seq-id-mode 0 --alt-ali 0 -c 0.8 --cov-mode 2 --max-seq-len 65535 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --threads 24 --compressed 0 -v 3 --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -s 5.7 -k 0 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seqs 300 --split 0 --split-mode 2 --split-memory-limit 0 --diag-score 1 --exact-kmer-matching 0 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --min-ungapped-score 15 --spaced-kmer-mode 1 --rescore-mode 0 --filter-hits 0 --sort-results 0 --mask-profile 1 --e-profile 0.001 --wg 0 --allow-deletion 0 --filter-msa 1 --filter-min-enable 0 --max-seq-id 0.9 --qid '0.0' --qsc -20 --cov 0 --diff 1000 --pseudo-cnt-mode 0 --gap-pc 10 --min-length 30 --max-length 32734 --max-gaps 2147483647 --contig-start-mode 2 --contig-end-mode 2 --orf-start-mode 1 --forward-frames 1,2,3 --reverse-frames 1,2,3 --translation-table 1 --translate 0 --use-all-table-starts 0 --id-offset 0 --create-lookup 0 --add-orf-stop 0 --sequence-overlap 0 --sequence-split-mode 1 --headers-split-mode 0 --chain-alignments 0 --merge-query 1 --search-type 0 --start-sens 4 --sens-steps 1 --exhaustive-search 0 --exhaustive-search-filter 0 --strand 1 --lca-search 0 --disk-space-limit 0 --force-reuse 0 --remove-tmp-files 0
prefilter /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/querySetDB /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/targetSetDB /data1/schwabs/python/AF_proteome-Loki_ossiferum/tmpFolder/7822136860474669941/search/9611601922758526377/pref_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -k 0 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seq-len 65535 --max-seqs 300 --split 0 --split-mode 2 --split-memory-limit 0 -c 0.8 --cov-mode 2 --comp-bias-corr 1 --comp-bias-corr-scale 1 --diag-score 1 --exact-kmer-matching 0 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --min-ungapped-score 15 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --threads 24 --compressed 0 -v 3 -s 5.7
Query database size: 5119 type: Aminoacid
Estimated memory consumption: 1006M
Target database size: 8310 type: Aminoacid
Index table k-mer threshold: 112 at k-mer size 6
Index table: counting k-mers
[=================================================================] 8.31K 0s 51ms
Index table: Masked residues: 42510
Index table: fill
[=================================================================] 8.31K 0s 67ms
Index statistics
Entries: 2578376
DB size: 503 MB
Avg k-mer size: 0.040287
Top 10 k-mers
GPGGTL 44
FHVRES 31
NIGLHS 31
IVLSIV 31
KRRRER 30
AISAAS 30
VGPRVS 30
INPLLV 30
QARYLY 30
WRTSLE 29
Time for index table init: 0h 0m 0s 651ms
Process prefiltering step 1 of 1
k-mer similarity threshold: 112
Starting prefiltering scores calculation (step 1 of 1)
Query db start 1 to 5119
Target db start 1 to 8310
[=================================================================] 5.12K 1s 32ms
320.202991 k-mers per position
5251 DB matches per sequence
0 overflows
49 sequences passed prefiltering per query sequence
40 median result list length
18 sequences with 0 size result lists
Time for merging to pref_0: 0h 0m 0s 73ms
Time for processing: 0h 0m 2s 129ms
align /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/querySetDB /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/targetSetDB /data1/schwabs/python/AF_proteome-Loki_ossiferum/tmpFolder/7822136860474669941/search/9611601922758526377/pref_0 /data1/schwabs/python/AF_proteome-Loki_ossiferum/tmpFolder/7822136860474669941/result --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 1 --alignment-mode 2 --alignment-output-mode 0 --wrapped-scoring 0 -e 10 --min-seq-id 0 --min-aln-len 30 --seq-id-mode 0 --alt-ali 0 -c 0.8 --cov-mode 2 --max-seq-len 65535 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --threads 24 --compressed 0 -v 3
Compute score, coverage and sequence identity
Query database size: 5119 type: Aminoacid
Target database size: 8310 type: Aminoacid
Calculation of alignments
[=================================================================] 5.12K 0s 866ms
Time for merging to result: 0h 0m 0s 61ms
152567 alignments calculated
11225 sequence pairs passed the thresholds (0.073574 of overall calculated)
2.192811 hits per query sequence
Time for processing: 0h 0m 1s 160ms
prefixid /data1/schwabs/python/AF_proteome-Loki_ossiferum/tmpFolder/7822136860474669941/result /data1/schwabs/python/AF_proteome-Loki_ossiferum/tmpFolder/7822136860474669941/result_prefixed --threads 24 -v 3
[=================================================================] 5.12K 0s 42ms
Time for merging to result_prefixed: 0h 0m 0s 113ms
Time for processing: 0h 0m 0s 289ms
besthitbyset /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/querySetDB /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/targetSetDB /data1/schwabs/python/AF_proteome-Loki_ossiferum/tmpFolder/7822136860474669941/result_prefixed /data1/schwabs/python/AF_proteome-Loki_ossiferum/tmpFolder/7822136860474669941/aggregate --simple-best-hit 1 --suboptimal-hits 0 --threads 24 --compressed 0 -v 3
[=================================================================] 5.12K 0s 18ms
Time for merging to aggregate: 0h 0m 0s 60ms
Time for processing: 0h 0m 0s 238ms
mergeresultsbyset /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/querySetDB_set_to_member /data1/schwabs/python/AF_proteome-Loki_ossiferum/tmpFolder/7822136860474669941/aggregate /data1/schwabs/python/AF_proteome-Loki_ossiferum/tmpFolder/7822136860474669941/aggregate_merged --threads 24 -v 3
Time for merging to aggregate_merged: 0h 0m 0s 51ms
Time for processing: 0h 0m 0s 161ms
combinehits /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/querySetDB /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/targetSetDB /data1/schwabs/python/AF_proteome-Loki_ossiferum/tmpFolder/7822136860474669941/aggregate_merged /data1/schwabs/python/AF_proteome-Loki_ossiferum/tmpFolder/7822136860474669941/matches /data1/schwabs/python/AF_proteome-Loki_ossiferum/tmpFolder/7822136860474669941 --alpha 1 --aggregation-mode 0 --filter-self-match 0 --threads 24 --compressed 0 -v 3
[=================================================================] 1 0s 39ms
Time for merging to matches_h: 0h 0m 0s 74ms
Time for merging to matches: 0h 0m 0s 50ms
Time for processing: 0h 0m 0s 348ms
clusterhits /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/querySetDB /data1/schwabs/python/AF_proteome-Loki_ossiferum/spacedust/targetSetDB /data1/schwabs/python/AF_proteome-Loki_ossiferum/tmpFolder/7822136860474669941/matches /data1/schwabs/python/AF_proteome-Loki_ossiferum/tmpFolder/7822136860474669941/clusters --multihit-pval 0.01 --cluster-pval 0.01 --max-gene-gap 3 --cluster-size 2 --cluster-use-weight 0 --db-output 1 --alpha 1 --threads 24 --compressed 0 -v 3
[Error: clusterhits failed
Error log:
Invalid query lookup record
Metadata
Metadata
Assignees
Labels
No labels