Skip to content

Only get 1 core hashes in pangenome #20

@AnneliektH

Description

@AnneliektH

Hello,

I get only a really small amount of core hashes for any pangenome I make (1-5 core hashes)
I have been turning up and down the thresholds in the script, but this does not help much.

all sketches are here: /group/ctbrowngrp2/amhorst/2025-pangenome/results/pangenome/l_amylovorus/test_sourmash_param

Downloaded all gtdb reference strains for 1 species, then sketched individually
cat all sigs for 1 species (sketched at scaled=1,k=21,31)
sourmash sig cat *.zip -o ../l_amylovorus.zip

Downsample bc pangenome_merge ignores a scaled value (and will make pangenome sketch at 1)
sourmash sig downsample l_amylovorus.zip -k 21 --scaled 100 -o l_amylovorus.21.100.zip

pangenome merge
sourmash scripts pangenome_merge l_amylovorus.21.100.zip -k 21 \ -o l_amylovorus.pang.zip --scaled 100

make ranktable

sourmash scripts pangenome_ranktable l_amylovorus.pang.zip -o l_amylovorus.pang.original_script.csv 
-k 21 --scaled 100

I have one ranktable made with the original script (1 core hash)
When defining central_core_threshold = 0.50 , I get 1 core hash.
When defining central_core_threshold = 0.15 , I get about 9600 core hashes.
However, in the original script, the threshold for shell is also 0.15. When looking at number of shell hashes in the ranktable made with the original script, i get about 15000 hashes classified as shell.

How do you define these? It's based of number of genomes a hash is found in right? Any ideas?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions