Sequeduct Methyl

Sequeduct Methyl is an extension to Sequeduct as a stand-alone Nextflow analysis pipeline to validate cytosine methylations (5mC, 5hmC, or 4mC) or adenine methylations (6mA) in plasmids and DNA constructs.

A detailed demonstration is available at demo.

Usage

Setup

Sequeduct Methyl was developed on Ubuntu 22.04 LTS and tested on a workstation with x86_64 CPU and NVIDIA RTX A4500 GPU.

Install the following software:

Nextflow to run the pipeline
Dorado for basecalling
SAMtools (any version≥1.16) for indexing
Modkit for creating a summary table of methylations

Make sure these software are available in your path. This can be done by running the command below to add each software to the PATH variable, taking Dorado as an example:

export PATH="$PATH:/path/to/dorado-1.0.0-linux-x64/bin"

Please be aware that the basecaller, Dorado, requires specific hardware (GPU) to run. This is detailed in the 'Platforms' section on their website.

Subsequently, download a selected Dorado basecalling model. The available models are listed on their website, under section 'DNA models'. For example:

dorado download --model [email protected]

The model is saved as a directory with several files, in your current work directory.

Additionally, install the required Python packages. We recommend using a separate Python environment (e.g. Anaconda) for this work. Please find the required Python packages in the requirements.txt file. The specified package versions, using Python 3.12, were confirmed to work together.

Pull the Sequeduct Methyl Nextflow pipeline:

nextflow pull edinburgh-genome-foundry/Sequeduct_Methyl -r v0.1.5

Run

Analyse methylation readouts

Create a working directory for your analysis. Copy (or link) the raw read POD5 directory (pod5_pass) from Oxford Nanopore Sequencing runs to the working directory. This directory should contain POD5 subdirectories for each sample (e.g. barcode). Specify the path to the POD5 directory with the --pod5_dir parameter. Also include the paths to the directory containing the reference GenBank-format files using --genbank_dir, the sample sheet using --sample_sheet and the parameter sheet using --param_sheet. The full path to the dorado model (e.g. [email protected]) should also be specified with --model_path. The project name can be set using --projectname.

Example command:

nextflow run edinburgh-genome-foundry/Sequeduct_Methyl -r v0.1.5 -entry analysis \
    --pod5_dir='path/to/pod5_pass' \
    --genbank_dir='path/to/genbank_ref/dir' \
    --sample_sheet='path/to/sample_sheet.csv' \
    --param_sheet='path/to/parameter_sheet.csv' \
    --model_path='/full/path/to/dorado/model/directory' \
    --projectname='Methylation Project'

This command will create a new directory named output in the current working directory of the results. One final PDF report will be created, summarising the methylation analysis of all samples run in the pipeline. Additionally, Nextflow automatically creates a work directory for the workflow. Ensure that you do not already have a directory named work in this location.

Examples of both the sample sheet and parameter sheet are available at demo/sheets. Through the parameter sheet, the thresholds for % methylations can be specified. This refers to the % of reads that are modified for that position to be deemed methylated, or unmethylated. Any positions with a % of reads between these two specified modification cutoffs are considered undetermined. Alongside this in the parameter sheet, specify the methylases whose patterns will be considered to identify methylated positions. The associated methylation pattern of the methylase is automatically identified. Multiple methylase enzymes can be specified separated by a space. The methylases available to choose from for (i) cytosine methylation are: AluI, BamHI, CpG, EcoKDcm, GpC, HaeIII, Hhal, HpaII, MspI for cytosine methylations, or 'MetC' can be specified to investigate all C positions, whilst the methylases available for (ii) adenine methylation are: EcoBI, EcoKDam, EcoKI, EcoRI, or TaqI, or EcoGII for investigating all A positions. For more detailed information, please consult EpiJinn.

Details

The desired methylation modifications to be checked can be specified from the models 5mC_5hmC, 4mC_5mC, or 6mA using the --model parameter when running the pipeline. The default model is set to 5mC_5hmC. Optional methylation level thresholds parameters can also be specified, using --mod_5mC_threshold for the 5mC threshold, --mod_5hmC_threshold for the 5hmC threshold, --mod_4mC_threshold for the 4mC threshold and --mod_6mA_threshold for the 6mA threshold. If not specified, these methylation confidence thresholds are taken to be the optimised thresholds as specified in the nextflow.config file.

Additionally, alongside the final PDF file with detailed analysis output, the HTML report version, aligned BAM file and bedMethyl files are also automatically saved in the output directory. If you desire to not save these two extra files, set their corresponding parameters (--html_file, --aligned_bam or --bedMethyl respectively) to 'false' when running the command below. If the additional FASTA reference file or sorted and indexed BAM files are desired, then their corresponding parameters (--fasta_ref or --indexed_bam respectively) can be set to 'true' when running the command below.

It is advised to pull the newest version of Sequeduct Methyl before analysis, and download the latest versions of dorado, modkit, and EpiJinn software.

Convert FAST5 files to POD5

An additional pipeline is provided to convert the old FAST5 file format to the new POD5 format.

First, install Docker and clone the repository:

git clone https://github.com/Edinburgh-Genome-Foundry/Sequeduct_Methyl.git

Then, build the Docker container:

docker build -f Sequeduct_Methyl/containers/Dockerfile --tag converter_docker .

Alternatively, those with access to EGF's container repository such as EGF staff, can pull the Docker image using the following:

docker pull ghcr.io/edinburgh-genome-foundry/sequeduct_methyl:v0.1.5

Run the below command to convert FAST5 to POD5. Specify the path to the sample sheet using --sample_sheet and the full path to the directory that contains FAST5 subdirectories using --fast5_dir:

nextflow run edinburgh-genome-foundry/Sequeduct_Methyl -r v0.1.5 -entry converter \
    --sample_sheet='path/to/sample_sheet.csv' \
    --fast5_dir='/full/path/to/fast5_pass' \
    -with-docker converter_docker

A pod5_pass directory will be created in the directory used for --fast5_dir that contains the POD5 file outputs in their corresponding sample directory name. This pod5_pass directory should be used as input for --pod5_dir when running the analysis as stated above.

Demonstration

Additional documentation, explanation of parameters and demonstration with example data is available at the demo repo.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
bin		bin
conf		conf
containers		containers
images		images
nextflow		nextflow
.bumpversion.toml		.bumpversion.toml
LICENSE		LICENSE
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sequeduct Methyl

Usage

Setup

Run

Analyse methylation readouts

Details

Convert FAST5 files to POD5

Demonstration

License = GPLv3+

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

Edinburgh-Genome-Foundry/Sequeduct_Methyl

Folders and files

Latest commit

History

Repository files navigation

Sequeduct Methyl

Usage

Setup

Run

Analyse methylation readouts

Details

Convert FAST5 files to POD5

Demonstration

License = GPLv3+

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages