Kids First Data Resource Center Alignment and Haplotype Calling Workflow (bam-to-cram-to-gVCF). This pipeline follows Broad best practices outlined in Data pre-processing for variant discovery. It uses bam input and aligns/re-aligns to a bwa-indexed reference fasta, version hg38. Resultant bam is de-dupped and base score recalibrated. Contamination is calculated and a gVCF is created using GATK4 Haplotype caller. Inputs from this can be used later on for further analysis in joint trio genotyping and subsequent refinement and deNovo variant analysis.
- pipeline flowchart:
- tool images: https://hub.docker.com/r/kfdrc/
- dockerfiles: https://github.com/d3b-center/bixtools
- tested with
- https://console.cloud.google.com/storage/browser/broad-references/hg38/v0/
- kfdrc bucket: s3://kids-first-seq-data/broad-references/
- cavatica: https://cavatica.sbgenomics.com/u/yuankun/kf-reference/
input_bam: input.bam
reference_fasta: Homo_sapiens_assembly38.fasta # For proper bwa functionality, you also need to copy over all bwa index files related to this reference, with suffixes .alt, .amb, .ann, .bwt, .pac, .sa. These are known as "secondary files" in cwl.
reference_fai: Homo_sapiens_assembly38.fai
reference_dict: Homo_sapiens_assembly38.dict
knownsites:
- 1000G_omni2.5.hg38.vcf.gz
- 1000G_phase1.snps.high_confidence.hg38.vcf.gz
- Homo_sapiens_assembly38.known_indels.vcf.gz
- Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
dbsnp_vcf: Homo_sapiens_assembly38.dbsnp138.vcf
dbsnp_vcf_index: Homo_sapiens_assembly38.dbsnp138.vcf.idx
wgs_calling_interval_list: wgs_calling_regions.hg38.interval_list
wgs_coverage_interval_list: wgs_coverage_regions.hg38.interval_list
wgs_evaluation_interval_list: wgs_evaluation_regions.hg38.interval_list
contamination_sites_bed: Homo_sapiens_assembly38.contam.bed
contamination_sites_mu: Homo_sapiens_assembly38.contam.mu
contamination_sites_ud: Homo_sapiens_assembly38.contam.UD
- sequence_grouping_tsv, generated by
bin/CreateSequenceGroupingTSV.py
- example-inputs.json
Note, for all vcf files, indexing may be required - a "secondary file" requirement.