GitHub

Augur

Augur is Python package to track (and eventually forecast) flu evolution. It currently

imports public sequence data
subsamples, cleans and aligns sequences
builds a phylogenetic tree from this data

The program is live on Amazon EC2 with results pushed to Amazon S3. The latest JSON-formatted flu tree is available as tree_streamline.json. This tree is visualized at blab.github.io/auspice/.

Run

You can run across platforms using Docker. An image is up on the Docker hub repository as trvrb/augur. With this public image, you can immediately run augur with

docker pull trvrb/augur
docker run -ti -e "GISAID_USER=$GISAID_USER" -e "GISAID_PASS=$GISAID_PASS" -e "S3_KEY=$S3_KEY" -e "S3_SECRET=$S3_SECRET" -e "S3_BUCKET=$S3_BUCKET" --privileged trvrb/augur

This starts up Supervisor to keep augur and helper programs running. This uses supervisord.conf as a control file.

To run augur, you will need a GISAID account (to pull sequences) and an Amazon S3 account (to push results). Account information is stored in environment variables:

GISAID_USER: GISAID user name
GISAID_PASS: GISAID password
S3_KEY: Amazon S3 key
S3_SECRET: Amazon S3 secret
S3_BUCKET: Amazon S3 bucket

Develop

Full dependency information can be seen in the Dockerfile. To run locally, pull the docker image with

docker pull trvrb/augur

And start up a bash session with

docker run -ti -e "GISAID_USER=$GISAID_USER" -e "GISAID_PASS=$GISAID_PASS" trvrb/augur /bin/bash

From here, the build pipeline can be run with

python augur/run.py

Pipeline notes

Virus ingest, alignment and filtering

Ingest

Using Selenium to automate downloads from GISAID. GISAID requires login access. User credentials are stored in the ENV as GISAID_USER and GISAID_PASS.

Filter

Keeps viruses with full HA1 sequences, fully specified dates, cell passage and only one sequence per strain name. Subsamples to 100 sequences per month for the last 3 years before present.

Align

Align sequences with mafft. Testing showed a much lower memory footprint than muscle.

Clean

Keep only sequences that have the full 1701 bases of HA in the alignment.

Tree processing

Infer

Using FastTree to get a starting tree. FastTree will build a tree for ~5000 sequences in a few minutes. Then using RAxML to refine this initial tree. A full RAxML run on a tree with ~5000 sequences could take days or weeks, so instead RAxML is run for a fixed 1 hour and the best tree found during this search is kept. This will always improve on FastTree.

Clean

Reroot the tree based on outgroup strain, collapse nodes with zero-length branches and ladderize the tree.

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
augur		augur
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Augur

Run

Develop

Pipeline notes

Virus ingest, alignment and filtering

Ingest

Filter

Align

Clean

Tree processing

Infer

Clean

About

Releases

Packages

rneher/augur

Folders and files

Latest commit

History

Repository files navigation

Augur

Run

Develop

Pipeline notes

Virus ingest, alignment and filtering

Tree processing

About

Resources

Stars

Watchers

Forks