Skip to content

Use Bakta for speedy and consistent genome annotation

License

Notifications You must be signed in to change notification settings

FischbachLab/nf-bakta

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

88 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NF-BAKTA

Bakta version = 1.9.4 Bakta database version = 5.1 PyHmmer version = 0.10.15 PFamA version = 36

Python Environment

conda create -n bakta -c conda-forge python=3.11 cloudpathlib-s3 pandas notebook fsspec s3fs

Seedfile format

The helper script, create_seedfile.py, will create the properly formatted seedfile for you if you can point it to an S3 path.

cd nf-bakta
python bin/create_seedfile.py \
    -g s3://maf-users/Nathan_Johns/DBs/Segata_Genomes/Fastas/ \
    -project UHGG_Annotation \
    -prefix 20221221 \
    --extension .fasta

This helper script will also recommend a job submission command that you can use to launch your job using the seedfile that was just created.

Test

Local

nextflow run main.nf \
    --seedfile test/test_20221220_0.seedfile.csv \
    --project 00_Test \
    --prefix 20241010-pfam

Remote

aws batch submit-job \
    --job-name nf-bakta-pfam-test-1 \
    --job-queue priority-maf-pipelines \
    --job-definition nextflow-production \
    --container-overrides command=FischbachLab/nf-bakta,\
"--seedfile","s3://genomics-workflow-core/Results/Bakta/00_Test/seedfiles/test_20221220_0.seedfile.csv",\
"--project","00_Test",\
"--prefix","20241010-pfam"

Bakta Database

v4.0 = ???
v5.0, type=full, 2023-02-20, DOI: 10.5281/zenodo.7669534
v5.1, type=full, 2024-01-19, DOI: 10.5281/zenodo.10522951

The database for this pipeline is stored on our EFS at /mnt/efs/databases/Bakta/db/v5.0. This path is provided as the bakta_db parameter. Note that this path should not be staged within the pipleine, but just passed as a value. This is done because all containers have access to that path, i.e. it's already available/staged/mounted for the container to use.

Download new database

This was needed when Bakta moved from db schema v4.0 to v5.0.

cd /mnt/efs/databases/Bakta/db
mkdir v5
docker container run \
    --rm \
    -u $(id -u):$(id -g) \
    -v /mnt/efs/databases/Bakta/db/v5:/db \
    458432034220.dkr.ecr.us-west-2.amazonaws.com/bakta:1.9.3 \
    bakta_db download --output /db --type full

Update existing database

mkdir -p /mnt/efs/databases/Bakta/db/tmp
cd /mnt/efs/databases/Bakta/db
docker container run \
-it \
    --rm \
    -v /mnt/efs/databases/Bakta/db/v5:/db \
    -v /mnt/efs/databases/Bakta/db/db_tmp:/bakta_tmp \
    458432034220.dkr.ecr.us-west-2.amazonaws.com/bakta:1.9.3 \
    bakta_db update --db /db --tmp-dir /tmp

PFam Annotation Parsing

References:

About

Use Bakta for speedy and consistent genome annotation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published