Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SNP calling #65

Open
ewels opened this issue Jan 30, 2019 · 6 comments
Open

Add SNP calling #65

ewels opened this issue Jan 30, 2019 · 6 comments
Labels
enhancement New feature or request

Comments

@ewels
Copy link
Member

ewels commented Jan 30, 2019

It would be nice to be able to have the option of calling variants from bisulfite data.

It shouldn't be too tricky to add Bis-SNP or something similar as a new opt-in process. There may be other / better tools also?

@ewels ewels added the enhancement New feature or request label Jan 30, 2019
@bazyliszek
Copy link

bazyliszek commented Jan 31, 2019

Felix Krueger mentioned four different packages for that purpose. Bis-SNP, MethylExtract, BS-SNPer and CGmapTools. Also, BScall can do.

Also bit different stuff, from Wreczycka et all paper 2017:

"the majority of CpGs with high inter-population differences contain common genomic SNPs (minor allele frequency > 0.01) (Daca-Roszaket al., 2015). To ensure more reliable interpretation of the data we advise removing known C/T SNPs which can interfere with methylation calls."

It would be also nice to have a dictionary with these sites for human and possibility of removing it, if desired (--remove.common_snps).

Variant calls could be also derived from matched genome sequencing data or public databases such as dbSNP (https://www.ncbi.nlm.nih.gov/projects/SNP/dbSNP.cgi?list=sslist)

@ewels
Copy link
Member Author

ewels commented Jan 31, 2019

Ooh, @FelixKrueger? I wouldn't trust that guy.. 😆 Yes all sounds good - does anyone have a favourite tool?

The common SNPs feature would be nice, but I guess that's a separate issue as it doesn't require SNP calling, it's just a filtering step right? Do such lists already exist somewhere? Perhaps we can generate such a list from a VCF file in the pipeline. Then we could use the files available for multiple species already in iGenomes.

I think that matching to WGS and external databases is perhaps beyond the scope of this pipeline for now. If the pipeline produces a VCF it shouldn't be too difficult for people to play with this anyway. We could perhaps even make a separate nf-core pipeline for doing pairwise comparison / QC of VCF files...

@FelixKrueger
Copy link
Contributor

I agree, it might be a nice pipeline to have. The tools mentioned above were - of course (in good old bioinformatics manner) - shown to be much superior to previously published tools. We don't personally use SNP exclusion on a regular basis, so I am not sure which one is best/easiest to implement.

On a slightly different note, would anyone object if we dropped Bowtie (1) from Bismark, and added HISAT2 instead?

@ewels
Copy link
Member Author

ewels commented Jan 31, 2019

Sure - go for it! Alignment speed can be one of the main annoyances with Bismark so a faster tool with comparable output would be great 👍 (though does this mean that I have to update the --relaxMismatches code? 😱 )

@brucemoran
Copy link

Hi, was this ever implemented or is there a fork that some work was done on?

@sateeshperi
Copy link
Contributor

@ewels this will tie into Biscuit right ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants