-
Notifications
You must be signed in to change notification settings - Fork 87
DSL2: genotyping #1016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
DSL2: genotyping #1016
Changes from all commits
Commits
Show all changes
121 commits
Select commit
Hold shift + click to select a range
67220ce
add pileupcaller params to config and schema
TCLamnidis eb61d10
genotyping parameter requirement checks
TCLamnidis 1bf3529
Add required modules
TCLamnidis 05f1b13
Install modules
TCLamnidis 1b82eaa
wip genotyping
TCLamnidis 0e8b3c7
Install GATK UG modules
TCLamnidis d35ff76
Merge remote-tracking branch 'origin/dev' into dsl2-genotyping
TCLamnidis c0cd49c
started adding GATK_UG
TCLamnidis 9e6b407
Update gatk3 modules
TCLamnidis ecc4257
Add genotyping SWF
TCLamnidis 5ca5043
work on gatk ug
TCLamnidis 89a7aab
Add gatk UG
TCLamnidis 1ef7d1a
no intervals in ug call
TCLamnidis 29a3e89
add version
TCLamnidis 2d7b622
emit UG output
TCLamnidis 0b4598e
tweak gatk UG outputs
TCLamnidis aab831f
rename emissions
TCLamnidis 6e8d819
delete leftover debug print from map.nf
TCLamnidis 8b4b79d
Add params for gatk and gatkUG
TCLamnidis 8575592
WIP adding params to GATK UG
TCLamnidis 60e7eec
reorder params
TCLamnidis 2f9496b
add dbSNP placeholder to be able to test. parameters passes to gatk now
TCLamnidis 020eba2
convert bcftools_stats to skip
TCLamnidis 2fa9508
finish schema
TCLamnidis dd6ddc6
Merge branch 'dev' into dsl2-genotyping
TCLamnidis a299a26
Merge branch 'dsl2-genotyping' of github.com:nf-core/eager into dsl2-…
TCLamnidis b0da6bd
merge conflict
TCLamnidis 16d567a
Update bcftools_stats
TCLamnidis 97f1471
add bcftools stats to UG
TCLamnidis 64d0df5
add todo comment
TCLamnidis c5bbb3b
Add config for bcftools stats
TCLamnidis 2cf5eac
record manual tests
TCLamnidis 947ee60
remove unnecessary bash block
TCLamnidis 32fdd2c
Merge branch 'dev' into dsl2-genotyping
TCLamnidis fd16f48
attempt to add dbsnp to reference sheet
TCLamnidis a056508
pass dbsnp to genotyping
TCLamnidis f312446
Include ploidy into ref_meta
TCLamnidis 05675d5
gatk UG done with dbsnp
TCLamnidis dcc3e47
fix indentation
TCLamnidis a91659e
update haplotypecaller module
TCLamnidis c88aaa3
port UG channels to HC
TCLamnidis 4e88678
Add gatk HC params. Update some gatk UG param text
TCLamnidis a8dc2d4
add gatk HC params
TCLamnidis bf9cd87
add gatk HC. Add patterns to genotyping module publishDir
TCLamnidis edb5e58
add HC. fix indent. move bcftools
TCLamnidis 49d531d
HC manual tests. update UG tests
TCLamnidis f86bdad
update TODOs
TCLamnidis d174435
update freebayes module
TCLamnidis 297ea51
Add Freebayes
TCLamnidis 8f1aa9d
manual tests
TCLamnidis 68386e4
add pileupcaller aux files
TCLamnidis a963652
remove old dbsnp input. fix genotyping swf cardinality
TCLamnidis b070eea
add pileupcaller bed and snp files
TCLamnidis 69dae31
Add pileupcaller. simplify input channels.
TCLamnidis 6371e9f
add pileupcaller and samtools mpileup
TCLamnidis 0f1d71b
no mpileup output. add pattern to pileupcaller
TCLamnidis c7042c0
deal with optional files.
TCLamnidis d4dc6f9
clearer formatting of Genotyping call
TCLamnidis 599375c
add warning todo for inconsistent options
TCLamnidis 4e0bb0d
manual tests for genotyping. add multiref per block
TCLamnidis 89478b9
add small todos
TCLamnidis a6e8274
Merge remote-tracking branch 'origin/dev' into dsl2-genotyping
TCLamnidis cc94772
Merge remote-tracking branch 'origin/dev' into dsl2-genotyping
TCLamnidis e732f27
remove empty defaults
TCLamnidis be20f06
Update all modules
TCLamnidis 5d4eaf6
fix linting warnings
TCLamnidis 2e38e63
add collect_genotypes
TCLamnidis 0c9f4ac
add genotype collection
TCLamnidis 7d0fbc4
update manual tests
TCLamnidis 1279891
linting
TCLamnidis 687a2d0
oopsie bugfix
TCLamnidis 5f6d466
add test for each genotyper.
TCLamnidis 4a0366f
Add errors when pileupcaller is used without bed or snp file
TCLamnidis 056e0f6
small tweaks
TCLamnidis 80d28cb
small changes
TCLamnidis 2cab201
reposition a line
TCLamnidis 6e6ea1c
fix error condition
TCLamnidis bd063fa
fix error conditional
TCLamnidis 73b7d0a
remove library ids from genotyping configs (libs merged)
TCLamnidis 0b19335
fix file name collision in GATK RTC
TCLamnidis 7a3ea2a
remove debug statements. add python version to version yml
TCLamnidis a3b16a2
add coverage stats. add mqc files to mqc channel
TCLamnidis 97bdc0b
update manual tests
TCLamnidis d273610
Merge remote-tracking branch 'origin/dev' into dsl2-genotyping
TCLamnidis 6d4ac28
Apply suggestions from code review to modules.conf
TCLamnidis 156e132
remove commented lines, update comments
TCLamnidis 0c2d9ee
Update parameter name for keeping realigned bam
TCLamnidis d74de23
rename parameter
TCLamnidis 5046dfe
Apply suggestions from code review to schema wording
TCLamnidis ea27287
standardise mpileup helptext wording
TCLamnidis 756c170
Update genotyping_pileupcaller_method helptext
TCLamnidis 51400cd
Apply suggestions from code review to schema
TCLamnidis b21e0f7
Remove TODO about parameter validation
TCLamnidis 784103a
Merge branch 'dsl2-genotyping' of github.com:nf-core/eager into dsl2-…
TCLamnidis 9020037
Apply suggestions from code review to genotype swf
TCLamnidis 61df8b9
remove todo about issue #1054
TCLamnidis 3f95223
merge both ploidy parameters into one genotyping_reference_ploidy param
TCLamnidis bc1c924
Install BCFTOOLS_INDEX
TCLamnidis b9ef51f
update gatk_HC module
TCLamnidis 24525b1
index VCF files
TCLamnidis 8edd468
add warning about angsd
TCLamnidis f34045e
rename meta attribute id -> sample_id for consistency
TCLamnidis c461478
simplify output channels
TCLamnidis 322ffa1
Update GATK_UG
TCLamnidis 83808ed
remove dumps
TCLamnidis 9b4a484
update manual_tests.md
TCLamnidis 9b448f3
add genotyper to meta of genotypes
TCLamnidis 1036da2
Merge branch 'dev' into dsl2-genotyping
TCLamnidis 8df94c4
remove todos
TCLamnidis d20d2ce
Add output information on genotypers
TCLamnidis 0e359a4
Clarify pileupcaller
TCLamnidis 2fa3b73
add citations
TCLamnidis 780ebaf
add bcftools citation
TCLamnidis 7f9d6b7
Merge branch 'dev' into dsl2-genotyping
TCLamnidis 309b8b8
update modules.json (remove dumpSV)
TCLamnidis b78c3f1
validate parameter combinations
TCLamnidis e8b27df
linting
TCLamnidis 0b93d81
remove lib dependency
TCLamnidis 4763601
typo
TCLamnidis ccd118c
minor edits and linting
TCLamnidis d39a10a
Merge branch 'dev' into dsl2-genotyping
TCLamnidis File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,102 @@ | ||
| #!/usr/bin/env python | ||
|
|
||
| # MIT License (c) Thiseas C. Lamnidis (@TCLamnidis) | ||
|
|
||
| import argparse | ||
| import filecmp | ||
|
|
||
| def file_len(fname): | ||
| with open(fname) as f: | ||
| for i, l in enumerate(f): | ||
| pass | ||
| return i + 1 | ||
|
|
||
| ## A function to return the number of genotypes per line in a .geno file. | ||
| def file_width(fname): | ||
| with open(fname) as f: | ||
| for i in f: | ||
| return(len(i.strip())) | ||
| break | ||
|
|
||
| ## A function to check that there are no duplicate individual IDs across ind files. | ||
| def check_for_duplicate_ids(indf1, indf2): | ||
| with open(indf1) as f: | ||
| inds1 = [x.strip().split()[0] for x in f.readlines()] | ||
| with open(indf2) as f: | ||
| inds2 = [x.strip().split()[0] for x in f.readlines()] | ||
| intersection = set(inds1).intersection(inds2) | ||
| if len(intersection) > 0: | ||
| raise IOError("Input .ind files contain duplicate individual IDs. Duplicates: {}".format(intersection)) | ||
|
|
||
| ## Function to check that the snp files are identical | ||
| def check_snp_files(snpf1, snpf2): | ||
| if not filecmp.cmp(snpf1, snpf2): | ||
| raise IOError("Input .snp files are not identical.") | ||
|
|
||
| ## Function to check the consistency of an eigenstrat database | ||
| def validate_eigenstrat(genof, snpf, indf): | ||
| dimsGeno = [file_len(genof), file_width(genof)] | ||
| linesSnp = file_len(snpf) | ||
| linesInd = file_len(indf) | ||
|
|
||
| # print(dimsGeno,linesSnp,linesInd) | ||
| ##Check geno and snp compatibility | ||
| if dimsGeno[0] != linesSnp: | ||
| raise IOError("Input .snp and .geno files do not match.") | ||
|
|
||
| ##Check geno and ind compatibility | ||
| if dimsGeno[1] != linesInd: | ||
| raise IOError("Input .ind and .geno files do not match.") | ||
|
|
||
| VERSION = "1.0.0" | ||
|
|
||
| parser = argparse.ArgumentParser(usage="%(prog)s (-i <Input file prefix>) (-c <input ind file> | -R | -E) [-L <SAMPLE LIST> | -S Ind [-S Ind2]] [-o <OUTPUT FILE PREFIX>]" , description="A tool to check two different EingenStrat databses for shared individuals, and extract or remove individuals from an EigenStrat database.") | ||
| parser._optionals.title = "Available options" | ||
| parser.add_argument("-g1", "--genoFn1", type = str, metavar = "<GENO FILE 1 NAME>", required = True, help = "The path to the input geno file of the first dataset.") | ||
| parser.add_argument("-s1", "--snpFn1", type = str, metavar = "<SNP FILE 1 NAME>", required = True, help = "The path to the input snp file of the first dataset.") | ||
| parser.add_argument("-i1", "--indFn1", type = str, metavar = "<IND FILE 1 NAME>", required = True, help = "The path to the input ind file of the first dataset.") | ||
| parser.add_argument("-g2", "--genoFn2", type = str, metavar = "<GENO FILE 2 NAME>", required = True, help = "The path to the input geno file of the second dataset.") | ||
| parser.add_argument("-s2", "--snpFn2", type = str, metavar = "<SNP FILE 2 NAME>", required = True, help = "The path to the input snp file of the second dataset.") | ||
| parser.add_argument("-i2", "--indFn2", type = str, metavar = "<IND FILE 2 NAME>", required = True, help = "The path to the input ind file of the second dataset.") | ||
| parser.add_argument("-o", "--output", type = str, metavar = "<OUTPUT FILES PREFIX>", required = True, help = "The desired output file prefix. Three output files are created, <OUTPUT FILES PREFIX>.geno , <OUTPUT FILES PREFIX>.snp and <OUTPUT FILES PREFIX>.ind .") | ||
| parser.add_argument("-v", "--version", action='version', version="{}".format(VERSION), help="Print the version and exit.") | ||
| args = parser.parse_args() | ||
|
|
||
| ## Open input files | ||
| GenoFile1 = open(args.genoFn1, "r") | ||
| SnpFile1 = open(args.snpFn1, "r") | ||
| IndFile1 = open(args.indFn1, "r") | ||
|
|
||
| GenoFile2 = open(args.genoFn2, "r") | ||
| # SnpFile2 = open(args.snpFn2, "r") ## Never actually read in line by line | ||
| IndFile2 = open(args.indFn2, "r") | ||
|
|
||
| ## open output files | ||
| GenoFileOut = open(args.output + ".geno", "w") | ||
| SnpFileOut = open(args.output + ".snp", "w") | ||
| IndFileOut = open(args.output + ".ind", "w") | ||
|
|
||
| ## Perform basic validation on inputs | ||
| validate_eigenstrat(args.genoFn1, args.snpFn1, args.indFn1) | ||
| validate_eigenstrat(args.genoFn2, args.snpFn2, args.indFn2) | ||
| check_for_duplicate_ids(args.indFn1, args.indFn2) | ||
| check_snp_files(args.snpFn1, args.snpFn2) | ||
|
|
||
| ## Now actually merge the data | ||
| ## Geno | ||
| for line1, line2 in zip(GenoFile1, GenoFile2): | ||
| geno_line="{}{}".format(line1.strip(),line2.strip()) | ||
| print(geno_line, file=GenoFileOut) | ||
|
|
||
| ## Snp | ||
| ## Copying the file would be faster, but this way we do not rely on the os or external packages. | ||
| ## We already checked that the snp files are byte-identical, so we can just copy one of them. | ||
| for line in SnpFile1: | ||
| print(line.strip(), file=SnpFileOut) | ||
|
|
||
| ## Ind | ||
| ## The indfiles are simply concatenated in the same order as the geno file. | ||
| for line in IndFile1: | ||
| print(line.strip(), file=IndFileOut) | ||
| for line in IndFile2: | ||
| print(line.strip(), file=IndFileOut) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.