- Download sequences and meta information from GISAID
- In EPIFLU, select host as
human
, selectHA
as required segment, select Submission Date >= last upload date to vdb - Ideally download about 5000 isolates at a time, may have to split downloads by submission date
- Download Isolates as XLS with YYYY-MM-DD date format
- Download Isolates as "Sequences (DNA) as FASTA"
- Select all DNA
- Fasta Header as 0: DNA Accession no., 1: Isolate name, 2: Isolate ID, 3: Segment, 4: Passage details/history, 5: Submitting lab
DNA Accession no. | Isolate name | Isolate ID | Segment | Passage details/history | Submitting lab
- Move files to
fauna/data
asgisaid_epiflu.xls
andgisaid_epiflu.fasta
. - Upload to vdb database
python3 vdb/flu_upload.py -db vdb -v flu --source gisaid --fname gisaid_epiflu
- Recommend running with
--preview
to confirm strain names and locations are correctly parsed before uploading- Can add to geo_synonyms file, flu_strain_name_fix file and flu_fix_location_label file to fix some of the formatting.
All of these functions are quite slow given they run over ~600k documents. Use sparingly.
-
Update genetic grouping fields
python3 vdb/flu_update.py -db vdb -v flu --update_groupings
- updates
vtype
,subtype
,lineage
-
Update locations
python3 vdb/flu_update.py -db vdb -v flu --update_locations
- updates
division
,country
andregion
fromlocation
-
Update passage_category fields
python3 vdb/flu_update.py -db vdb -v flu --update_passage_categories
- update
passage_category
based onpassage
field
python3 vdb/flu_download.py -db vdb -v flu --select locus:HA lineage:seasonal_h3n2 --fstem h3n2
python3 vdb/flu_download.py -db vdb -v flu --select locus:HA lineage:seasonal_h1n1pdm --fstem h1n1pdm
python3 vdb/flu_download.py -db vdb -v flu --select locus:HA lineage:seasonal_vic --fstem vic
python3 vdb/flu_download.py -db vdb -v flu --select locus:HA lineage:seasonal_yam --fstem yam
- Convert NIMR report pdfs to csv files
- Move csv files to subtype directory in
fauna/data/
- Upload to tdb database
python3 tdb/upload.py -db tdb -v flu --subtype h3n2 --ftype flat --fstem h3n2_nimr_titers
- Recommend running with
--preview
to confirm strain names are correctly parsed before uploading- Can add to HI_ref_name_abbreviations file and HI_flu_strain_name_fix file to fix some strain names.
- Move line-list tsv files to
fauna/data/
- Upload to tdb database with
python3 tdb/upload.py -db tdb -v flu --subtype h3n2 --ftype flat --fstem H3N2_HI_titers_upload
- Move line-list tsv files to
fauna/data/
- Upload HI titers to tdb database with
python3 tdb/cdc_upload.py -db cdc_tdb -v flu --ftype flat --fstem HITest_Oct2019_to_Sep2020_titers
- Upload FRA titers to tdb database with
python3 tdb/cdc_upload.py -db cdc_tdb -v flu --ftype flat --fstem FRA_Oct2019_to_Sep2020_titers
- Move Excel documents to
fauna/data/
- Run
python3 tdb/crick_upload.py -db crick_tdb --assay_type hi --fstem H3N2HIs
- Run
python3 tdb/crick_upload.py -db crick_tdb --assay_type fra --fstem H3N2VNs
- Run
python3 tdb/crick_upload.py -db crick_tdb --assay_type hi --fstem H1N1pdm09HIs
- Run
python3 tdb/crick_upload.py -db crick_tdb --assay_type hi --fstem BVicHIs
- Run
python3 tdb/crick_upload.py -db crick_tdb --assay_type hi --fstem BYamHIs
- Make sure
NIID-Tokyo-WHO-CC/
is a sister directory tofauna/
- Upload all titers with
python3 tdb/upload_all.py --sources niid -db niid_tdb
- Make sure
VIDRL-Melbourne-WHO-CC/
is a sister directory tofauna/
- Upload all titers with
python3 tdb/upload_all.py --sources vidrl -db vidrl_tdb
- These are flat CSV files that should replace the raw Excel tables.
- Upload with
python3 tdb/vidrl_upload.py -db vidrl_tdb -v flu --subtype <subtype> --path <path> --fstem <fstem> --ftype flat
python3 tdb/download.py -db tdb -v flu --subtype h3n2
python3 tdb/download.py -db tdb -v flu --subtype h1n1pdm
python3 tdb/download.py -db tdb -v flu --subtype vic
python3 tdb/download.py -db tdb -v flu --subtype yam