You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fixed handling of unknown barcode combinations (in dual asymmetric mode); thanks to Alice Retter for reporting
Refactored and optimized the tag-jump removal step
Fixed a bug with duplicated sequences in the tag-jump removal step; thanks to Valentin Étienne for reporting
Implemented a chunking option for splitting the dataset into smaller parts prior to clustering in Step-2 (pre-clustering, clustering, and denoising moved to a separate sub-workflow), using MMseqs2
Added possibility to disable reference-based and/or de novo chimera removal steps and tag-jump removal
New parameters added:
lima_remove_unknown (default, false; if true, unknown barcode combinations are removed from demultiplexed data)
chunking_n (number of chunks to split the dataset into prior to clustering)
chunking_id (minimum sequence identity used for splitting the dataset into chunks)
chimera_methods (specifies which chimera removal methods to use - "ref" for reference-based, "denovo" for de novo, or "ref,denovo" for both; could be also "none" or null to disable chimera removal)
tj (specifies whether to run tag-jump removal - "true" or "false")
Added DADA2 denoising (--preclustering dada2; also works with --clustering none)
Implemented automated documentation for analysis procedures (generates README_Step1_Methods.txt and README_Step2_Methods.txt in the pipeline_info directory)
Refactored the runtime parameter summary and help message
Added test profiles (test, test1, test2)
Improved run summary for Step-1
Default parameters changed:
ITSx now checks only a single strand (option ITSx_complement set to F). This should be safe for most cases, as amplicons were re-oriented using primers during the pipeline run. However, we recommend checking the results carefully (e.g., columns ITSx_Extracted_Reads and ITSx_Yield_Percent in the run summary)
Prior to tag-jump removal, sequences are now dereplicated at 100% identity (option tj_id set to 1). It is possible to pre-cluster sequences at a lower similarity threshold (e.g., --tj_id 0.99) but this will take much longer. This change should also be safe for most cases, as amplicons undergo homopolymer-correction
Fixed a minor bug in extraction of sample IDs at the ref-based chimera rescue step; thanks to Valentin Étienne for reporting