The functional annotation workflow takes a draft assembly (parameter: genome
) and
predicted gene coordinates (e.g., from Maker; parameter: gff_annotation
), and assigns functional
annotation based on similarity to existing protein databases (parameter: blast_db_fasta
).
Run workflow using the singularity profile:
params.yml
:
subworkflow: 'functional_annotation'
genome: '/path/to/genome/assembly.fasta'
gff_annotation: '/path/to/annotation.gff3'
blast_db_fasta: '/path/to/protein/database.fasta'
outdir: '/path/to/save/results'
Command line:
nextflow run NBISweden/pipelines-nextflow \
-profile singularity \
-params-file params.yml
- General:
gff_annotation
: Path to GFF genome annotation.genome
: Path to the genome assembly.outdir
: Path to the results folder.records_per_file
: Number of fasta records per file to distribute to blast and interproscan (default: 1000).codon_table
: (default: 1).blast_db_fasta
: Path to blast protein database fasta.merge_annotation_identifier
: The identifier to use for labeling genes (default: NBIS).use_pcds
: If true, enables the pcds flag when merging annotation.
In these workflows, the Nextflow process directive ext.args
is used to inject command line tool parameters directly to the shell script.
These command line tool parameters can be changed by overriding the ext.args
variable for the respective process in a configuration file.
nextflow.config
:
process {
withName: 'INTERPROSCAN' {
ext.args = '--iprlookup --goterms -pa -t p'
}
}
See Functional annotation modules config for the default tool configuration.
- Extract protein sequences based on GFF coordinates.
- Blast protein sequences against protein database.
- Query protein sequences against interproscan databases.
- Merge functional annotations.