-
Notifications
You must be signed in to change notification settings - Fork 20
Description
I am working with large number of genomes (BAM files). I have noticed that BRASS greatly reduces its recall performance on inversions once it passes a certain "threshold" of variants. This only happens with certain BAMs that contain a noticeable higher amount of variants.
I have taken a closer look at the code, and I believe it is related to this section in metropolis_hastings_inversions.R:
BRASS/perl/share/Rscripts/metropolis_hastings_inversions.R
Lines 324 to 352 in dd0e1c1
| # Write out results | |
| output_file = sub(".inversions.pdf", ".is_fb_artefact.txt", pdf_file) | |
| if (bad_groups_count >= 50) { | |
| write.table( | |
| data.frame( | |
| d[,7], # ID | |
| mcmc_res[["artefact_prob"]][,1], # Probability of being true | |
| rank(1 - mcmc_res[["artefact_prob"]][,1]) < threshold_idx # Whether the rearrangement is to be kept | |
| ), | |
| output_file, | |
| row.names = F, | |
| col.names = F, | |
| sep = "\t", | |
| quote = F | |
| ) | |
| } else { | |
| write.table( | |
| data.frame( | |
| d[,7], # ID | |
| rep(1, nrow(d)), | |
| rep(TRUE, nrow(d)) | |
| ), | |
| output_file, | |
| row.names = F, | |
| col.names = F, | |
| sep = "\t", | |
| quote = F | |
| ) | |
| } |
In these BAMs with higher amount of variants, the bad_groups_count variable is key. When is higher than 50, it causes the script to follow a more precision-based approach and skips almost all inversions:
rank(1 - mcmc_res[["artefact_prob"]][,1]) < threshold_idx # Whether the rearrangement is to be keptFrom reading the rest of the code, it looks like bad_groups_count is just the count of all variants that have 4 tumor reads and are less than 1e5 in length. From the outside, it looks like a very arbitrary choice that unnecessarily links BRASS' recall performance to the number of variants in the genome (they might be more to it, but I could not find it).
We are developing a benchmarking platform for somatic variant calling and this auto-penalty greatly affects BRASS's position in the ranking. Before publishing the results, we wanted to let you know as it looks like a very easy fix.