Skip to content

MRCIEU/methylhead

Repository files navigation

methylhead · Panel‑WGMS Analysis Pipeline

CI Nextflow Docker Pulls Apptainer License: MIT

methylhead is a modular Nextflow workflow that turns raw targeted methyl-seq FASTQ files into QC‑checked methylation matrices, cell‑composition estimates and model‑based risk scores—ready for statistics or reporting.


🌟 Why methylhead? — Feature highlights

Feature  Description
End‑to‑end panel‑WGBS From raw FASTQ to sample‑level risk scores with a single command
Cell‑composition inference Blood‑cell deconvolution using bundled reference libraries
Model‑based predictions Runs arbitrary EWAS/age/risk models defined in a CSV
Reproducible & portable Fully containerised (Apptainer); no system installation
Modular Nextflow core Parallel execution, --resume, profile support
Rich QC out‑of‑the‑box Per‑sample & per‑locus thresholds, MultiQC and Quarto HTML/PDF reports


· Prerequisites

Requirement Tested version Check with
Apptainer  ≥ 1.1.0 apptainer --version
Conda  ≥ 23.x conda -V
Internet outbound HTTPS

1 · Clone the repository

# Pick any folder you like
git clone [email protected]:MRCIEU/methylhead.git
cd methylhead

2 · Quick start (≈ 5-10 min)

# Install & activate Nextflow if you haven’t yet
conda create -y -n methylhead nextflow -c bioconda
conda activate methylhead

# Run the built‑in demo (downloads containers on first run)
nextflow -C nextflow-test.config run main.nf 

3 · (One‑off) Build the reference genome (≈ 2 h)

bash scripts/create-reference.sh -N [email protected]

Creates reference/hg19/ with all bwameth indices. Skip this step if you already have an indexed hg19 reference.


4 · Run on your own samples

nextflow run main.nf \
  --data            path/to/fastqs/*.fastq.gz \
  --genome_folder   path/to/hg19.fa \
  --cell_reference  path/to/cell-reference.csv \
  --panel           path/to/panel.csv \
  --phenotype       path/to/phenotype.csv \
  --models          path/to/models.csv \
  --outdir          results/ \
  -N [email protected] \
  --resume
  • Leave out -N if you do not want an email summary.
  • --resume lets Nextflow pick up from where a previous run left off—it will skip any steps that already finished successfully. More: Nextflow docs › resume

Mandatory parameters

Flag Description Example
--data Glob of gz‑compressed FASTQ files mydata/*.fastq.gz
--genome_folder Indexed hg19 FASTA (.fa + .bwt/.amb/...) reference/hg19.fa
--cell_reference cell-type-specific reference for cell-count estimation data/blood-cell-type-reference.csv.gz
--panel CSV with per‑locus QC thresholds panel.csv
--phenotype Sample‑level metadata pheno.csv
--models EWAS / risk‑prediction model definitions models.csv

See input/readme.md for file formats & examples.

Optional flags:

Flag Purpose Default
--outdir Where results go results/
-N Email run summary off
--wgbs_image etc. Override container URIs built‑ins

5 · Outputs at a glance

results/
├── alignments/          # deduplicated BAM + stats
├── methylation_calls/   # BedGraphs per sample
├── matrices/            # CpG, coverage & 450k matrices
├── qc/                  # MultiQC + HTML/PDF report
└── predictions/         # Risk scores & association tests

6 · Workflow overview

This directory contains a single file:

File Description
workflow.png Auto-generated Nextflow DAG

The workflow.png file visualizes the task-level dependencies in the pipeline, as produced by nextflow dag.

See /flowchart/readme.md for file formats step by step.


7 · Containers in use

Flag Default URI Includes
wgbs_image oras://docker.io/onuroztornaci/methylhead-pipeline:wgbs_analysis WGBS aligners & QC
meth_image oras://docker.io/onuroztornaci/methylhead-pipeline:meth_analysis R 4.4.3, Python 3, Bioconductor
qc_image oras://docker.io/onuroztornaci/methylhead-pipeline:qc_container R 4.4.1, Quarto

Build your own images → see /container-def-files.


8 · Bundled panel and target files

  • data/blood-cell-type-reference.csv.gz — Cell-type-specific reference for cell-count estimation
  • input/panel.csv — Targeted CpG coordinates

Override with --cell_reference and --panel if you have a different panel.


9 · Utilities


10 · Troubleshooting cheatsheet

Symptom Likely cause & fix
ERROR: Apptainer not found Install Apptainer ≥ 1.1 and add it to $PATH.
Java <11 warning Forgot to conda activate methylhead.
No FASTQ files Check your --data glob – must end in .fastq.gz.
Index not found for hg19.fa Run 4 · reference build or point --genome_folder to an indexed ref.
Path not mounted: data/reference outside $HOME Move data and reference folders inside $HOME, or start Apptainer with -B /abs/path:/abs/path to bind-mount them.

Happy methylating 🧬🚀

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •