Skip to content

Poor inversion recall performance in BAM files with a high amount of variants #114

@Rapsssito

Description

@Rapsssito

I am working with large number of genomes (BAM files). I have noticed that BRASS greatly reduces its recall performance on inversions once it passes a certain "threshold" of variants. This only happens with certain BAMs that contain a noticeable higher amount of variants.

I have taken a closer look at the code, and I believe it is related to this section in metropolis_hastings_inversions.R:

# Write out results
output_file = sub(".inversions.pdf", ".is_fb_artefact.txt", pdf_file)
if (bad_groups_count >= 50) {
write.table(
data.frame(
d[,7], # ID
mcmc_res[["artefact_prob"]][,1], # Probability of being true
rank(1 - mcmc_res[["artefact_prob"]][,1]) < threshold_idx # Whether the rearrangement is to be kept
),
output_file,
row.names = F,
col.names = F,
sep = "\t",
quote = F
)
} else {
write.table(
data.frame(
d[,7], # ID
rep(1, nrow(d)),
rep(TRUE, nrow(d))
),
output_file,
row.names = F,
col.names = F,
sep = "\t",
quote = F
)
}

In these BAMs with higher amount of variants, the bad_groups_count variable is key. When is higher than 50, it causes the script to follow a more precision-based approach and skips almost all inversions:

rank(1 - mcmc_res[["artefact_prob"]][,1]) < threshold_idx  # Whether the rearrangement is to be kept

From reading the rest of the code, it looks like bad_groups_count is just the count of all variants that have 4 tumor reads and are less than 1e5 in length. From the outside, it looks like a very arbitrary choice that unnecessarily links BRASS' recall performance to the number of variants in the genome (they might be more to it, but I could not find it).

We are developing a benchmarking platform for somatic variant calling and this auto-penalty greatly affects BRASS's position in the ranking. Before publishing the results, we wanted to let you know as it looks like a very easy fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions