Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparing core genomes #136

Closed
SamVaGo opened this issue Oct 3, 2023 · 7 comments
Closed

Comparing core genomes #136

SamVaGo opened this issue Oct 3, 2023 · 7 comments
Assignees
Labels

Comments

@SamVaGo
Copy link

SamVaGo commented Oct 3, 2023

Is there a possibility to compare core genomes (or persistent genes) and extract the "difference"?

@axbazin
Copy link
Member

axbazin commented Oct 3, 2023

Hi,

If you mean compare core vs persistent, Yes, you can write down the gene families belonging to all partitions with:

ppanggolin write -p pangenome.h5 --partitions --output MYOUTPUTDIR

There you find lists for exact core, soft core, persistent, shell, anything you may want.

Then if you sort the files and diff them, you can easily get the difference, e.g.:

cat exact_core.txt | sort > sorted_exact_core.txt 
cat persistent.txt | sort > sorted_persistent.txt 

#nb of genes in persistent and not in exact core
diff sorted_exact_core.txt  sorted_persistent.txt | grep -c '>'

#nb of genes in exact core and not in persistent
diff sorted_exact_core.txt  sorted_persistent.txt | grep -c '<'

Or do any other kind of comparisons quite easily.

Adelme

@axbazin
Copy link
Member

axbazin commented Oct 3, 2023

If you meant between pangenomes, you can follow what I've written in this issue: #68 (comment)

@SamVaGo
Copy link
Author

SamVaGo commented Oct 3, 2023

Hi,

Thank you for the quick response! I would like to compare the core genome of one group of organisms (of one species) and compare them to another group from the same species (but a different lineage) in order to check the difference between the two lineages.

In essence it is not really the "species core genome" that I would like to compare, but rather the genes that are (almost) always present in one lineage and how they differ from the genes that are "core" to another lineage or the species. It would be like taking the core genome of the lineage and substract the core genome of the species in order to have the lineage-specific core.

Many thanks!

Best
Sam

@axbazin
Copy link
Member

axbazin commented Oct 3, 2023

I see !
In essence, it's quite similar, with some variations, to my answer in #68 (comment)

You can do it as such:

get all persistent sequences for both pangenomes:

ppanggolin fasta --prot_families persistent -p pangenome_1.h5 -o prot_persistent_species_1 
ppanggolin fasta --prot_families persistent -p pangenome_2.h5 -o prot_persistent_species_2

Those commands will write a file 'persistent_protein_families.faa' in the output directory provided with -o.
Then, you can compare this file to the other pangenome:

ppanggolin align -p  pangenome_1.h5 --proteins prot_persistent_species_1/persistent_protein_families.faa -o align_persistent_pang2_to_pang1
ppanggolin align -p  pangenome_2.h5 --proteins prot_persistent_species_2/persistent_protein_families.faa -o align_persistent_pang1_to_pang2

You can provide --identity and --coverage thresholds for the comparison to 'ppanggolin align', to define depending on how distant your species are I guess

Then, you should get results on what is persistent from one species persistent in the other species, by reading the 'proteins_partition_projection.tsv' file.

It should get you close to what you wish to achieve, if I understood correctly !

@SamVaGo
Copy link
Author

SamVaGo commented Oct 3, 2023

I think so, many thanks!

@axbazin axbazin self-assigned this Oct 4, 2023
@SamVaGo
Copy link
Author

SamVaGo commented Oct 5, 2023

Is it correct that the code above should start with the .faa file after -S?
I also assume that pangenome1 should be aligned with prot_persistent_species_2/persistent_protein_families.faa?

Thanks again!

@axbazin
Copy link
Member

axbazin commented Oct 5, 2023

I did not understand what you meant by "-S".

Your assumption is correct it seems I wrote it too quickly, and should have inverted the two files, sorry!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants