[en desarrollo]
This pipeline analyzes mutations in enterovirus sequences, offering two modes of analysis: consensus sequence and viral population.
Ensure that the following programs/packages are installed on your system before running the pipeline:
- nextflow v24.04.3
- mafft v7.520
- seqtk v1.4-r122
- libgcc-ng >= v12
- trimmomatic v0.39
- minimap2 v2.26
- lofreq v2.1.5
- bcftools >= v1.17
- samtools >= v1.17
- minMutFinder >= v1
- python3 & modules:
- sys
- re
- os
- SeqIO from Bio
- csv
- pandas
- gzip
- shutil
- matplotlib
- seaborn
- plotly
- numpy
Ensure that the paths in the nextflow.config are pointing the correct folder.
- Input file:
samples-mutations.csv
- This file is expected to be generated by the
enterovirus-genotyping.nf
script. - It should contain the following columns (without headers):
- Sample or User directory
- Protein name (VP1)
- Path to consensus FASTA file
- Path to reference FASTA file
- This file is expected to be generated by the
The pipeline supports two types of mutation analysis:
-
Consensus Sequence Analysis:
- This mode compares the consensus sequence of the sample to a reference sequence.
- Command:
nextflow run mutations-nf/main.nf --file <samples-mutations.csv>
-
Viral Population Analysis:
- This mode analyzes mutations within the viral population, considering mutation frequency and depth.
- Command:
nextflow run mutations-nf/main.nf --file <samples-mutations.csv> --viral_population "yes"
- This mode requires that the input fastq files are provided.
Outputs are stored in the results
directory within each sample/user's directory.
-
Consensus Sequence Analysis Output (
mutations_VP1.csv
):- Columns:
- Sample/User ID
- Protein
- Mutation type
- Amino acid change
- Amino acid property change
- Nucleotide mutation
- Columns:
-
Viral Population Analysis Output (
mutations_VP1.csv
):- Columns:
- Sample/User ID
- Protein
- Mutation type
- Amino acid change
- Amino acid property change
- Nucleotide mutation
- Mutation frequency
- Mutation depth
- Columns:
-
Annotated Mutations Output (
Annotated_mutations.csv
):- If mutations are annotated, this separate file will be generated in the
results
directory. - This file will contain all the columns from the mutations_VP1.csv file, and will contain two additional columns.
- Annotated
- Annotated Mutation.
- If mutations are annotated, this separate file will be generated in the
-
Consensus Sequence Analysis:
nextflow run mutations-nf/main.nf --file samples-mutations.csv
-
Viral Population Analysis:
nextflow run mutations-nf/main.nf --file samples-mutations.csv --viral_population "yes"
- Ensure the
samples-mutations.csv
file is correctly formatted. - The
Annotated_mutations.csv
file will only be generated if mutation annotation is performed. - The
--viral_population "yes"
option requires that the fastq files used to create the consensus sequence are available.