How to prepare a BAM file #17

kuangzhuoran · 2023-01-05T13:07:07Z

"It requires as input the BAM file of the sample to be genotyped.”
In this step: "SVDSS smooth --bam sample.bam --workdir $PWD --reference GRCh38.fa --threads 16",
My understanding is that HiFi reads (CCS data) were used to map to the reference genome and get this bam file.

If I have genomes and HiFi data (CCS data) for multiple species and need to make inter- and intra-species comparisons, do all HiFi data map to the same genome, or do they map to themselves?

Thanks a lot !

Parsoa · 2023-01-05T20:13:38Z

Hi,

To do smoothing, you need to map your input CCS reads to the reference genome and then run SVDSS smooth on the resulting BAM file.

What sort of comparison are you trying to perform? SVDSS is not directly meant for comparative analysis. You can however genotype each of your samples individually and then compare the variants.

If you have several samples of the same species, one option is to map all of your samples to the same reference genome and genotype them with SVSDSS against that reference and then compare the genotypes using other tools for analysis.

You may find our earlier method PingPong useful for comparative analysis. SVDSS is based on PingPong.

kmustyxj · 2023-01-06T12:22:40Z

In this step: "SVDSS smooth --bam sample.bam --workdir $PWD --reference GRCh38.fa --threads 16"
smoothed_reads.txt and ignored_reads.txt in workdir is empty
[I] Processed batch 1. Reads so far 20000. Reads per second: 20000. Time: 1
[I] Processed bases: 1, num mismatch: 0, mismatch rate: 0, ignored reads: 0
[I] Processed batch 2. Reads so far 30000. Reads per second: 3750. Time: 8
[I] Processed bases: 1, num mismatch: 0, mismatch rate: 0, ignored reads: 0

ldenti · 2023-01-07T13:06:03Z

mmm that [I] Processed bases: 1, is quite strange.. That number should be the total number of bases processed if I'm not wrong (and it's initialized at 1).. It seems like all read have been filtered(smoothed_reads.txt should be non-empty).

Some reasons why this could happen:

there is no primary alignment (but I don't think this is the case)
the .bam is corrupted (as above)

every read is aligned to a chromosome not present in the input reference

SVDSS/smoother.cpp

Lines 265 to 268 in dc0333e

    
           string chrom(bam_header->target_name[alignment->core.tid]); 
        
           if (chromosome_seqs.find(chrom) == chromosome_seqs.end()) { 
        
             continue; 
        
           }

alignments are too dirty (and then are skipped)

How did you map the reads? In case, would it be possible for you to share the .bam?

Best,

ldenti added the question Further information is requested label Jan 6, 2023

ldenti added v1 This refers to SVDSS (v1.*.*) and removed v1 This refers to SVDSS (v1.*.*) labels Aug 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to prepare a BAM file #17

How to prepare a BAM file #17

kuangzhuoran commented Jan 5, 2023

Parsoa commented Jan 5, 2023

Uh oh!

kmustyxj commented Jan 6, 2023

Uh oh!

ldenti commented Jan 7, 2023

Uh oh!

How to prepare a BAM file #17

How to prepare a BAM file #17

Comments

kuangzhuoran commented Jan 5, 2023

Parsoa commented Jan 5, 2023

Uh oh!

kmustyxj commented Jan 6, 2023

Uh oh!

ldenti commented Jan 7, 2023

Uh oh!