README GENERAL

[en desarrollo]

2. Enterovirus Mutation Analysis Pipeline

This pipeline analyzes mutations in enterovirus sequences, offering two modes of analysis: consensus sequence and viral population.

Prerequisites

Ensure that the following programs/packages are installed on your system before running the pipeline:

nextflow v24.04.3
mafft v7.520
seqtk v1.4-r122
libgcc-ng >= v12
trimmomatic v0.39
minimap2 v2.26
lofreq v2.1.5
bcftools >= v1.17
samtools >= v1.17
minMutFinder >= v1
python3 & modules:
- sys
- re
- os
- SeqIO from Bio
- csv
- pandas
- gzip
- shutil
- matplotlib
- seaborn
- plotly
- numpy

Ensure that the paths in the nextflow.config are pointing the correct folder.

Input

Input file: samples-mutations.csv
- This file is expected to be generated by the enterovirus-genotyping.nf script.
- It should contain the following columns (without headers):
  - Sample or User directory
  - Protein name (VP1)
  - Path to consensus FASTA file
  - Path to reference FASTA file

Execution

The pipeline supports two types of mutation analysis:

Consensus Sequence Analysis:
- This mode compares the consensus sequence of the sample to a reference sequence.
- Command: nextflow run mutations-nf/main.nf --file <samples-mutations.csv>
Viral Population Analysis:
- This mode analyzes mutations within the viral population, considering mutation frequency and depth.
- Command: nextflow run mutations-nf/main.nf --file <samples-mutations.csv> --viral_population "yes"
- This mode requires that the input fastq files are provided.

Output

Outputs are stored in the results directory within each sample/user's directory.

Consensus Sequence Analysis Output (mutations_VP1.csv):
- Columns:
  - Sample/User ID
  - Protein
  - Mutation type
  - Amino acid change
  - Amino acid property change
  - Nucleotide mutation
Viral Population Analysis Output (mutations_VP1.csv):
- Columns:
  - Sample/User ID
  - Protein
  - Mutation type
  - Amino acid change
  - Amino acid property change
  - Nucleotide mutation
  - Mutation frequency
  - Mutation depth
Annotated Mutations Output (Annotated_mutations.csv):
- If mutations are annotated, this separate file will be generated in the results directory.
- This file will contain all the columns from the mutations_VP1.csv file, and will contain two additional columns.
  - Annotated
  - Annotated Mutation.

Example Commands

Consensus Sequence Analysis:

nextflow run mutations-nf/main.nf --file samples-mutations.csv

Viral Population Analysis:

nextflow run mutations-nf/main.nf --file samples-mutations.csv --viral_population "yes"

Notes

Ensure the samples-mutations.csv file is correctly formatted.
The Annotated_mutations.csv file will only be generated if mutation annotation is performed.
The --viral_population "yes" option requires that the fastq files used to create the consensus sequence are available.

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
genotyping-nf		genotyping-nf
mutations-nf		mutations-nf
test		test
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README GENERAL

2. Enterovirus Mutation Analysis Pipeline

Prerequisites

Input

Execution

Output

Example Commands

Notes

About

Releases

Packages

Languages

NBDsoftware/enterovirus

Folders and files

Latest commit

History

Repository files navigation

README GENERAL

2. Enterovirus Mutation Analysis Pipeline

Prerequisites

Input

Execution

Output

Example Commands

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages