Sequeduct Methyl is an extension to Sequeduct as a stand-alone Nextflow analysis pipeline to validate cytosine methylations (5mC, 5hmC, or 4mC) or adenine methylations (6mA) in plasmids and DNA constructs.
A detailed demonstration is available at demo.
Sequeduct Methyl was developed on Ubuntu 22.04 LTS and tested on a workstation with x86_64 CPU and NVIDIA RTX A4500 GPU.
Install the following software:
- Nextflow to run the pipeline
- Dorado for basecalling
- SAMtools (any version≥1.16) for indexing
- Modkit for creating a summary table of methylations
Make sure these software are available in your path. This can be done by running the command below to add each software to the PATH
variable, taking Dorado as an example:
export PATH="$PATH:/path/to/dorado-1.0.0-linux-x64/bin"
Please be aware that the basecaller, Dorado, requires specific hardware (GPU) to run. This is detailed in the 'Platforms' section on their website.
Subsequently, download a selected Dorado basecalling model. The available models are listed on their website, under section 'DNA models'. For example:
dorado download --model [email protected]
The model is saved as a directory with several files, in your current work directory.
Additionally, install the required Python packages. We recommend using a separate Python environment (e.g. Anaconda) for this work. Please find the required Python packages in the requirements.txt file. The specified package versions, using Python 3.12, were confirmed to work together.
Pull the Sequeduct Methyl Nextflow pipeline:
nextflow pull edinburgh-genome-foundry/Sequeduct_Methyl -r v0.1.5
Create a working directory for your analysis. Copy (or link) the raw read POD5 directory (pod5_pass
) from Oxford Nanopore Sequencing runs to the working directory.
This directory should contain POD5 subdirectories for each sample (e.g. barcode).
Specify the path to the POD5 directory with the --pod5_dir
parameter.
Also include the paths to the directory containing the reference GenBank-format files using --genbank_dir
, the sample sheet using --sample_sheet
and the parameter sheet using --param_sheet
.
The full path to the dorado model (e.g. [email protected]
) should also be specified with --model_path
.
The project name can be set using --projectname
.
Example command:
nextflow run edinburgh-genome-foundry/Sequeduct_Methyl -r v0.1.5 -entry analysis \
--pod5_dir='path/to/pod5_pass' \
--genbank_dir='path/to/genbank_ref/dir' \
--sample_sheet='path/to/sample_sheet.csv' \
--param_sheet='path/to/parameter_sheet.csv' \
--model_path='/full/path/to/dorado/model/directory' \
--projectname='Methylation Project'
This command will create a new directory named output
in the current working directory of the results. One final PDF report will be created, summarising the methylation analysis of all samples run in the pipeline. Additionally, Nextflow automatically creates a work
directory for the workflow. Ensure that you do not already have a directory named work
in this location.
Examples of both the sample sheet and parameter sheet are available at demo/sheets. Through the parameter sheet, the thresholds for % methylations can be specified. This refers to the % of reads that are modified for that position to be deemed methylated, or unmethylated. Any positions with a % of reads between these two specified modification cutoffs are considered undetermined. Alongside this in the parameter sheet, specify the methylases whose patterns will be considered to identify methylated positions. The associated methylation pattern of the methylase is automatically identified. Multiple methylase enzymes can be specified separated by a space. The methylases available to choose from for (i) cytosine methylation are: AluI, BamHI, CpG, EcoKDcm, GpC, HaeIII, Hhal, HpaII, MspI for cytosine methylations, or 'MetC' can be specified to investigate all C positions, whilst the methylases available for (ii) adenine methylation are: EcoBI, EcoKDam, EcoKI, EcoRI, or TaqI, or EcoGII for investigating all A positions. For more detailed information, please consult EpiJinn.
The desired methylation modifications to be checked can be specified from the models 5mC_5hmC, 4mC_5mC, or 6mA using the --model
parameter when running the pipeline. The default model is set to 5mC_5hmC. Optional methylation level thresholds parameters can also be specified, using --mod_5mC_threshold
for the 5mC threshold, --mod_5hmC_threshold
for the 5hmC threshold, --mod_4mC_threshold
for the 4mC threshold and --mod_6mA_threshold
for the 6mA threshold. If not specified, these methylation confidence thresholds are taken to be the optimised thresholds as specified in the nextflow.config file.
Additionally, alongside the final PDF file with detailed analysis output, the HTML report version, aligned BAM file and bedMethyl files are also automatically saved in the output directory. If you desire to not save these two extra files, set their corresponding parameters (--html_file
, --aligned_bam
or --bedMethyl
respectively) to 'false' when running the command below. If the additional FASTA reference file or sorted and indexed BAM files are desired, then their corresponding parameters (--fasta_ref
or --indexed_bam
respectively) can be set to 'true' when running the command below.
It is advised to pull the newest version of Sequeduct Methyl before analysis, and download the latest versions of dorado, modkit, and EpiJinn software.
An additional pipeline is provided to convert the old FAST5 file format to the new POD5 format.
First, install Docker and clone the repository:
git clone https://github.com/Edinburgh-Genome-Foundry/Sequeduct_Methyl.git
Then, build the Docker container:
docker build -f Sequeduct_Methyl/containers/Dockerfile --tag converter_docker .
Alternatively, those with access to EGF's container repository such as EGF staff, can pull the Docker image using the following:
docker pull ghcr.io/edinburgh-genome-foundry/sequeduct_methyl:v0.1.5
Run the below command to convert FAST5 to POD5. Specify the path to the sample sheet using --sample_sheet
and the full path to the directory that contains FAST5 subdirectories using --fast5_dir
:
nextflow run edinburgh-genome-foundry/Sequeduct_Methyl -r v0.1.5 -entry converter \
--sample_sheet='path/to/sample_sheet.csv' \
--fast5_dir='/full/path/to/fast5_pass' \
-with-docker converter_docker
A pod5_pass
directory will be created in the directory used for --fast5_dir
that contains the POD5 file outputs in their corresponding sample directory name.
This pod5_pass
directory should be used as input for --pod5_dir
when running the analysis as stated above.
Additional documentation, explanation of parameters and demonstration with example data is available at the demo repo.
Copyright 2024 Edinburgh Genome Foundry, University of Edinburgh. Sequeduct Methyl was designed by Jennifer Claire Muscat and Peter Vegh. It's implemented in Nextflow by Jennifer Claire Muscat.