Poor inversion recall performance in BAM files with a high amount of variants

I am working with large number of genomes (BAM files). I have noticed that BRASS greatly reduces its recall performance on inversions once it passes a certain "threshold" of variants. This only happens with certain BAMs that contain a noticeable higher amount of variants.

I have taken a closer look at the code, and I believe it is related to this section in metropolis_hastings_inversions.R:
https://github.com/cancerit/BRASS/blob/dd0e1c1324459c4090c598dc6a12b7b71ef34586/perl/share/Rscripts/metropolis_hastings_inversions.R#L324-L352

**In these BAMs with higher amount of variants, the `bad_groups_count` variable is key.** When is higher than 50, it causes the script to follow a more precision-based approach and skips almost all inversions:
```R
rank(1 - mcmc_res[["artefact_prob"]][,1]) < threshold_idx  # Whether the rearrangement is to be kept
```

From reading the rest of the code, it looks like **`bad_groups_count` is just the count of all variants that have 4 tumor reads and are less than 1e5 in length**. From the outside, it looks like a very arbitrary choice that unnecessarily links BRASS' recall performance to the number of variants in the genome (they might be more to it, but I could not find it).

We are developing a benchmarking platform for somatic variant calling and this auto-penalty greatly affects BRASS's position in the ranking. Before publishing the results, we wanted to let you know as it looks like a very easy fix.


	# Write out results
	output_file = sub(".inversions.pdf", ".is_fb_artefact.txt", pdf_file)
	if (bad_groups_count >= 50) {
	write.table(
	data.frame(
	d[,7], # ID
	mcmc_res[["artefact_prob"]][,1], # Probability of being true
	rank(1 - mcmc_res[["artefact_prob"]][,1]) < threshold_idx # Whether the rearrangement is to be kept
	),
	output_file,
	row.names = F,
	col.names = F,
	sep = "\t",
	quote = F
	)
	} else {
	write.table(
	data.frame(
	d[,7], # ID
	rep(1, nrow(d)),
	rep(TRUE, nrow(d))
	),
	output_file,
	row.names = F,
	col.names = F,
	sep = "\t",
	quote = F
	)
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Poor inversion recall performance in BAM files with a high amount of variants #114

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Poor inversion recall performance in BAM files with a high amount of variants #114

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions