git clone https://github.com/HKU-BAL/ncov19_cytosine_depletion.git
cd ncov19_cytosine_depletion/
conda env create -f environment.yml
conda activate ncov19-ca
npm install
Nextstrain is also required, please follow the installation guide in the page.
- Use
crawler.tsto download fasta from GISAID. - Run
align_fasta.py,cleanup.py,count_byday.py,plot.py,update_intro.pyandconcat_fasta.pyin order for processing the downloaded fasta.
- All scripts modify files under the
base_folder, which is defined and if needed, should be modified in bothparams.pyandcrawler.ts. Default is insidedownload_data/under this directory. - The repository of Nextstrain/ncov need to be referenced in
params.pyafter thencov_foldervariable. base_folder/fasta/stores the.fastaand.inforetrieved directly from GISAID.base_folder/aligned_fasta/stores the aligned fasta against the reference.base_folder/processed_fasta/stores the raw sequence fromfastathat is qualified (sequence length>29k, N<5%), whilebase_folder/backup_fasta/stores those that is not.