A Nextflow pipeline for running genome-wide association studies (GWAS) on UK Biobank exome sequencing data using REGENIE on the Research Analysis Platform (RAP).
This pipeline performs:
- Genotype QC - Merges chromosomal data and applies quality control filters
- Association Testing - Runs REGENIE two-step analysis for exome variants
- Post-processing - Prepares results for downstream analysis including BHR and visualization
Edit config.sh
to set your data paths and output directories.
nextflow run genotype_pipeline.nf
nextflow run exomWAS_pipeline/exome_was.nf \
--pheno_desc_file your_phenotypes.txt \
--output_dir results/
genotype_pipeline.nf
- Chromosome merging and SNP QCexomWAS_pipeline/
- Main REGENIE association analysispostanalysis/
- BHR preprocessing and result processingscripts/
- Utility scripts for data management
- Nextflow
- Access to UK Biobank RAP
- Python 3.x with pandas, numpy
- R with required packages
- REGENIE software
The pipeline generates:
- Association statistics per chromosome
- QC metrics and logs
- BHR-ready formatted files
- Manhattan plots and visualizations
For issues or questions about this pipeline, please refer to the individual README files in each subdirectory for detailed usage instructions.