Skip to content

How to prepare a BAM file #17

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kuangzhuoran opened this issue Jan 5, 2023 · 3 comments
Open

How to prepare a BAM file #17

kuangzhuoran opened this issue Jan 5, 2023 · 3 comments
Labels
question Further information is requested

Comments

@kuangzhuoran
Copy link

"It requires as input the BAM file of the sample to be genotyped.”
In this step: "SVDSS smooth --bam sample.bam --workdir $PWD --reference GRCh38.fa --threads 16",
My understanding is that HiFi reads (CCS data) were used to map to the reference genome and get this bam file.

If I have genomes and HiFi data (CCS data) for multiple species and need to make inter- and intra-species comparisons, do all HiFi data map to the same genome, or do they map to themselves?

Thanks a lot !

@Parsoa
Copy link
Owner

Parsoa commented Jan 5, 2023

Hi,

To do smoothing, you need to map your input CCS reads to the reference genome and then run SVDSS smooth on the resulting BAM file.

What sort of comparison are you trying to perform? SVDSS is not directly meant for comparative analysis. You can however genotype each of your samples individually and then compare the variants.

If you have several samples of the same species, one option is to map all of your samples to the same reference genome and genotype them with SVSDSS against that reference and then compare the genotypes using other tools for analysis.

You may find our earlier method PingPong useful for comparative analysis. SVDSS is based on PingPong.

@ldenti ldenti added the question Further information is requested label Jan 6, 2023
@kmustyxj
Copy link

kmustyxj commented Jan 6, 2023

In this step: "SVDSS smooth --bam sample.bam --workdir $PWD --reference GRCh38.fa --threads 16"
smoothed_reads.txt and ignored_reads.txt in workdir is empty
[I] Processed batch 1. Reads so far 20000. Reads per second: 20000. Time: 1
[I] Processed bases: 1, num mismatch: 0, mismatch rate: 0, ignored reads: 0
[I] Processed batch 2. Reads so far 30000. Reads per second: 3750. Time: 8
[I] Processed bases: 1, num mismatch: 0, mismatch rate: 0, ignored reads: 0

@ldenti
Copy link
Collaborator

ldenti commented Jan 7, 2023

mmm that [I] Processed bases: 1, is quite strange.. That number should be the total number of bases processed if I'm not wrong (and it's initialized at 1).. It seems like all read have been filtered(smoothed_reads.txt should be non-empty).

Some reasons why this could happen:

  • there is no primary alignment (but I don't think this is the case)
  • the .bam is corrupted (as above)
  • every read is aligned to a chromosome not present in the input reference

    SVDSS/smoother.cpp

    Lines 265 to 268 in dc0333e

    string chrom(bam_header->target_name[alignment->core.tid]);
    if (chromosome_seqs.find(chrom) == chromosome_seqs.end()) {
    continue;
    }
  • alignments are too dirty (and then are skipped)

How did you map the reads? In case, would it be possible for you to share the .bam?

Best,

@ldenti ldenti added v1 This refers to SVDSS (v1.*.*) and removed v1 This refers to SVDSS (v1.*.*) labels Aug 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants