-
Notifications
You must be signed in to change notification settings - Fork 20
Description
I am working with large number of genomes (BAM files). I have noticed that BRASS greatly reduces its recall performance on inversions once it passes a certain "threshold" of variants. This only happens with certain BAMs that contain a noticeable higher amount of variants.
I have taken a closer look at the code, and I believe it is related to this section in metropolis_hastings_inversions.R:
BRASS/perl/share/Rscripts/metropolis_hastings_inversions.R
Lines 324 to 352 in dd0e1c1
# Write out results | |
output_file = sub(".inversions.pdf", ".is_fb_artefact.txt", pdf_file) | |
if (bad_groups_count >= 50) { | |
write.table( | |
data.frame( | |
d[,7], # ID | |
mcmc_res[["artefact_prob"]][,1], # Probability of being true | |
rank(1 - mcmc_res[["artefact_prob"]][,1]) < threshold_idx # Whether the rearrangement is to be kept | |
), | |
output_file, | |
row.names = F, | |
col.names = F, | |
sep = "\t", | |
quote = F | |
) | |
} else { | |
write.table( | |
data.frame( | |
d[,7], # ID | |
rep(1, nrow(d)), | |
rep(TRUE, nrow(d)) | |
), | |
output_file, | |
row.names = F, | |
col.names = F, | |
sep = "\t", | |
quote = F | |
) | |
} |
In these BAMs with higher amount of variants, the bad_groups_count
variable is key. When is higher than 50, it causes the script to follow a more precision-based approach and skips almost all inversions:
rank(1 - mcmc_res[["artefact_prob"]][,1]) < threshold_idx # Whether the rearrangement is to be kept
From reading the rest of the code, it looks like bad_groups_count
is just the count of all variants that have 4 tumor reads and are less than 1e5 in length. From the outside, it looks like a very arbitrary choice that unnecessarily links BRASS' recall performance to the number of variants in the genome (they might be more to it, but I could not find it).
We are developing a benchmarking platform for somatic variant calling and this auto-penalty greatly affects BRASS's position in the ranking. Before publishing the results, we wanted to let you know as it looks like a very easy fix.