-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Hello,
I get only a really small amount of core hashes for any pangenome I make (1-5 core hashes)
I have been turning up and down the thresholds in the script, but this does not help much.
all sketches are here: /group/ctbrowngrp2/amhorst/2025-pangenome/results/pangenome/l_amylovorus/test_sourmash_param
Downloaded all gtdb reference strains for 1 species, then sketched individually
cat all sigs for 1 species (sketched at scaled=1,k=21,31)
sourmash sig cat *.zip -o ../l_amylovorus.zip
Downsample bc pangenome_merge ignores a scaled value (and will make pangenome sketch at 1)
sourmash sig downsample l_amylovorus.zip -k 21 --scaled 100 -o l_amylovorus.21.100.zip
pangenome merge
sourmash scripts pangenome_merge l_amylovorus.21.100.zip -k 21 \ -o l_amylovorus.pang.zip --scaled 100
make ranktable
sourmash scripts pangenome_ranktable l_amylovorus.pang.zip -o l_amylovorus.pang.original_script.csv
-k 21 --scaled 100
I have one ranktable made with the original script (1 core hash)
When defining central_core_threshold = 0.50
, I get 1 core hash.
When defining central_core_threshold = 0.15
, I get about 9600 core hashes.
However, in the original script, the threshold for shell is also 0.15. When looking at number of shell hashes in the ranktable made with the original script, i get about 15000 hashes classified as shell.
How do you define these? It's based of number of genomes a hash is found in right? Any ideas?