-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Hi
Thank you for this tool. I am excited to use this tool.
When I'm running human fecal shotgun metagenomics data through MEDI, I noticed that majority of the reads are being unclassified.
(medi) [rr@cn0116 tmpRR]$ kraken2 --db MEDI/medi/data/medi_db --confidence 0.3 --threads 24 --gzip-compressed --paired --output MF0035p.k2 --report MF0035p.tsv --memory-mapping test/work/e9/b9096a31dff7539598f895006c94b7/MF0035p_filtered_R1.fastq.gz test/work/e9/b9096a31dff7539598f895006c94b7/MF0035p_filtered_R2.fastq.gz
Loading database information... done.
21380521 sequences (5942.19 Mbp) processed in 60.246s (21293.4 Kseq/m, 5917.98 Mbp/m).
913 sequences classified (0.00%)
21379608 sequences unclassified (100.00%)
The following step returns an empty file and then the pipeline fails due to empty files being provided to the next commands.
architeuthis mapping architeuthis mapping filter MF0073p.k2 --data-dir MEDI/medi/data/medi_db/taxonomy --min-consistency 0.95 --max-entropy 0.1 --max-multiplicity 4 --out MF0073p_filtered.k2
Could you please help me figure out why the number of classified sequences are so low?
Is this an issue with my medi_db? It seems to have completed correctly during installation and the db building and hashing, but are there any diagnostics I can run to test it?
Do you have any sample paired-end fastq files and the resulting tables that you could provide me to test? Even if the results don't exactly match with my version of the database, it would help me troubleshoot if the pipeline is installed correctly and/or the expected results.
Thanks.
-Rich