Skip to content

ClaireXinSun/kf-alignment-workflow

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KFDRC Whole Genome Alignment Workflow

data service logo

Kids First Data Resource Center Alignment and Haplotype Calling Workflow (bam-to-cram-to-gVCF). This pipeline follows Broad best practices outlined in Data pre-processing for variant discovery. It uses bam input and aligns/re-aligns to a bwa-indexed reference fasta, version hg38. Resultant bam is de-dupped and base score recalibrated. Contamination is calculated and a gVCF is created using GATK4 Haplotype caller. Inputs from this can be used later on for further analysis in joint trio genotyping and subsequent refinement and deNovo variant analysis.

Basic Info

References:

Inputs:

  input_bam: input.bam
  reference_fasta: Homo_sapiens_assembly38.fasta # For proper bwa functionality, you also need to copy over all bwa index files related to this reference, with suffixes .alt, .amb, .ann, .bwt, .pac, .sa.  These are known as "secondary files" in cwl.
  reference_fai: Homo_sapiens_assembly38.fai
  reference_dict: Homo_sapiens_assembly38.dict
  knownsites:
  - 1000G_omni2.5.hg38.vcf.gz
  - 1000G_phase1.snps.high_confidence.hg38.vcf.gz
  - Homo_sapiens_assembly38.known_indels.vcf.gz
  - Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
  dbsnp_vcf: Homo_sapiens_assembly38.dbsnp138.vcf
  dbsnp_vcf_index: Homo_sapiens_assembly38.dbsnp138.vcf.idx
  wgs_calling_interval_list: wgs_calling_regions.hg38.interval_list
  wgs_coverage_interval_list: wgs_coverage_regions.hg38.interval_list
  wgs_evaluation_interval_list: wgs_evaluation_regions.hg38.interval_list
  contamination_sites_bed: Homo_sapiens_assembly38.contam.bed
  contamination_sites_mu: Homo_sapiens_assembly38.contam.mu
  contamination_sites_ud: Homo_sapiens_assembly38.contam.UD

Note, for all vcf files, indexing may be required - a "secondary file" requirement.

WF Visualized

About

🔬 Alignment workflow for Kids-First DRC

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Common Workflow Language 99.1%
  • Python 0.9%