Skip to content

NBDsoftware/enterovirus

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

README GENERAL

[en desarrollo]

2. Enterovirus Mutation Analysis Pipeline

This pipeline analyzes mutations in enterovirus sequences, offering two modes of analysis: consensus sequence and viral population.

Prerequisites

Ensure that the following programs/packages are installed on your system before running the pipeline:

  • nextflow v24.04.3
  • mafft v7.520
  • seqtk v1.4-r122
  • libgcc-ng >= v12
  • trimmomatic v0.39
  • minimap2 v2.26
  • lofreq v2.1.5
  • bcftools >= v1.17
  • samtools >= v1.17
  • minMutFinder >= v1
  • python3 & modules:
    • sys
    • re
    • os
    • SeqIO from Bio
    • csv
    • pandas
    • gzip
    • shutil
    • matplotlib
    • seaborn
    • plotly
    • numpy

Ensure that the paths in the nextflow.config are pointing the correct folder.

Input

  • Input file: samples-mutations.csv
    • This file is expected to be generated by the enterovirus-genotyping.nf script.
    • It should contain the following columns (without headers):
      • Sample or User directory
      • Protein name (VP1)
      • Path to consensus FASTA file
      • Path to reference FASTA file

Execution

The pipeline supports two types of mutation analysis:

  1. Consensus Sequence Analysis:

    • This mode compares the consensus sequence of the sample to a reference sequence.
    • Command: nextflow run mutations-nf/main.nf --file <samples-mutations.csv>
  2. Viral Population Analysis:

    • This mode analyzes mutations within the viral population, considering mutation frequency and depth.
    • Command: nextflow run mutations-nf/main.nf --file <samples-mutations.csv> --viral_population "yes"
    • This mode requires that the input fastq files are provided.

Output

Outputs are stored in the results directory within each sample/user's directory.

  1. Consensus Sequence Analysis Output (mutations_VP1.csv):

    • Columns:
      • Sample/User ID
      • Protein
      • Mutation type
      • Amino acid change
      • Amino acid property change
      • Nucleotide mutation
  2. Viral Population Analysis Output (mutations_VP1.csv):

    • Columns:
      • Sample/User ID
      • Protein
      • Mutation type
      • Amino acid change
      • Amino acid property change
      • Nucleotide mutation
      • Mutation frequency
      • Mutation depth
  3. Annotated Mutations Output (Annotated_mutations.csv):

    • If mutations are annotated, this separate file will be generated in the results directory.
    • This file will contain all the columns from the mutations_VP1.csv file, and will contain two additional columns.
      • Annotated
      • Annotated Mutation.

Example Commands

  • Consensus Sequence Analysis:

    nextflow run mutations-nf/main.nf --file samples-mutations.csv
  • Viral Population Analysis:

    nextflow run mutations-nf/main.nf --file samples-mutations.csv --viral_population "yes"

Notes

  • Ensure the samples-mutations.csv file is correctly formatted.
  • The Annotated_mutations.csv file will only be generated if mutation annotation is performed.
  • The --viral_population "yes" option requires that the fastq files used to create the consensus sequence are available.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Nextflow 51.3%
  • Python 48.7%