Skip to content

Latest commit

 

History

History
96 lines (69 loc) · 4.7 KB

FLU.md

File metadata and controls

96 lines (69 loc) · 4.7 KB

Flu Pipeline Notes

VDB

Upload documents to VDB

  1. Download sequences and meta information from GISAID
  • In EPIFLU, select host as human, select HA as required segment, select Submission Date >= last upload date to vdb
  • Ideally download about 5000 isolates at a time, may have to split downloads by submission date
  • Download Isolates as XLS with YYYY-MM-DD date format
  • Download Isolates as "Sequences (DNA) as FASTA"
    • Select all DNA
    • Fasta Header as 0: DNA Accession no., 1: Isolate name, 2: Isolate ID, 3: Segment, 4: Passage details/history, 5: Submitting lab
    • DNA Accession no. | Isolate name | Isolate ID | Segment | Passage details/history | Submitting lab
  1. Move files to fauna/data as gisaid_epiflu.xls and gisaid_epiflu.fasta.
  2. Upload to vdb database

Update documents in VDB

All of these functions are quite slow given they run over ~600k documents. Use sparingly.

  • Update genetic grouping fields

    • python3 vdb/flu_update.py -db vdb -v flu --update_groupings
    • updates vtype, subtype, lineage
  • Update locations

    • python3 vdb/flu_update.py -db vdb -v flu --update_locations
    • updates division, country and region from location
  • Update passage_category fields

    • python3 vdb/flu_update.py -db vdb -v flu --update_passage_categories
    • update passage_category based on passage field

Download documents from VDB

  • python3 vdb/flu_download.py -db vdb -v flu --select locus:HA lineage:seasonal_h3n2 --fstem h3n2
  • python3 vdb/flu_download.py -db vdb -v flu --select locus:HA lineage:seasonal_h1n1pdm --fstem h1n1pdm
  • python3 vdb/flu_download.py -db vdb -v flu --select locus:HA lineage:seasonal_vic --fstem vic
  • python3 vdb/flu_download.py -db vdb -v flu --select locus:HA lineage:seasonal_yam --fstem yam

TDB

Upload documents to TDB

Raw tables from NIMR reports

  1. Convert NIMR report pdfs to csv files
  2. Move csv files to subtype directory in fauna/data/
  3. Upload to tdb database
  • python3 tdb/upload.py -db tdb -v flu --subtype h3n2 --ftype flat --fstem h3n2_nimr_titers
  • Recommend running with --preview to confirm strain names are correctly parsed before uploading

Flat files

  1. Move line-list tsv files to fauna/data/
  2. Upload to tdb database with python3 tdb/upload.py -db tdb -v flu --subtype h3n2 --ftype flat --fstem H3N2_HI_titers_upload

CDC files

  1. Move line-list tsv files to fauna/data/
  2. Upload HI titers to tdb database with python3 tdb/cdc_upload.py -db cdc_tdb -v flu --ftype flat --fstem HITest_Oct2019_to_Sep2020_titers
  3. Upload FRA titers to tdb database with python3 tdb/cdc_upload.py -db cdc_tdb -v flu --ftype flat --fstem FRA_Oct2019_to_Sep2020_titers

Crick files

  1. Move Excel documents to fauna/data/
  2. Run python3 tdb/crick_upload.py -db crick_tdb --assay_type hi --fstem H3N2HIs
  3. Run python3 tdb/crick_upload.py -db crick_tdb --assay_type fra --fstem H3N2VNs
  4. Run python3 tdb/crick_upload.py -db crick_tdb --assay_type hi --fstem H1N1pdm09HIs
  5. Run python3 tdb/crick_upload.py -db crick_tdb --assay_type hi --fstem BVicHIs
  6. Run python3 tdb/crick_upload.py -db crick_tdb --assay_type hi --fstem BYamHIs

NIID files

  1. Make sure NIID-Tokyo-WHO-CC/ is a sister directory to fauna/
  2. Upload all titers with python3 tdb/upload_all.py --sources niid -db niid_tdb

VIDRL files

  1. Make sure VIDRL-Melbourne-WHO-CC/ is a sister directory to fauna/
  2. Upload all titers with python3 tdb/upload_all.py --sources vidrl -db vidrl_tdb
VIDRL Flat files
  • These are flat CSV files that should replace the raw Excel tables.
  • Upload with python3 tdb/vidrl_upload.py -db vidrl_tdb -v flu --subtype <subtype> --path <path> --fstem <fstem> --ftype flat

Download documents from TDB

  • python3 tdb/download.py -db tdb -v flu --subtype h3n2
  • python3 tdb/download.py -db tdb -v flu --subtype h1n1pdm
  • python3 tdb/download.py -db tdb -v flu --subtype vic
  • python3 tdb/download.py -db tdb -v flu --subtype yam