Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix custom annotation of breakends and non-supported SVs (exact mode) #1498

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 0 additions & 12 deletions modules/Bio/EnsEMBL/VEP/AnnotationSource/File.pm
Original file line number Diff line number Diff line change
Expand Up @@ -93,8 +93,6 @@ BEGIN {
}
}

use Bio::EnsEMBL::VEP::Parser qw(get_SO_term);

my %FORMAT_MAP = (
'vcf' => 'VCF',
'gff' => 'GFF',
Expand Down Expand Up @@ -458,19 +456,9 @@ sub _record_overlaps_VF {
my $type = $self->type();
my $overlap_cutoff = $self->{overlap_cutoff};
my $distance = $self->{distance};
my $same_type = $self->{same_type};
my $reciprocal = $self->{reciprocal};
my ($ref_start, $ref_end) = ($parser->get_start, $parser->get_end);

# match on variant class (if enabled)
# confounded by different descriptions for the same event
if ($same_type) {
my $vf_class = $vf->class_SO_term;
my $ref_class = get_SO_term($parser);

return 0 if defined $ref_class && defined $vf_class && $ref_class ne $vf_class;
}

nakib103 marked this conversation as resolved.
Show resolved Hide resolved
if($type eq 'overlap' || $type eq 'within' || $type eq 'surrounding') {
# account for insertions in Ensembl world where s = e+1
my ($vs, $ve) = ($vf->{start}, $vf->{end});
Expand Down
23 changes: 20 additions & 3 deletions modules/Bio/EnsEMBL/VEP/AnnotationSource/File/VCF.pm
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ use Bio::EnsEMBL::Utils::Exception qw(throw warning);
use Bio::EnsEMBL::Utils::Sequence qw(reverse_comp);
use Bio::EnsEMBL::Variation::Utils::VariationEffect qw(overlap);
use Bio::EnsEMBL::Variation::Utils::Sequence qw(get_matched_variant_alleles);
use Bio::EnsEMBL::VEP::Parser qw(get_SO_term);
use Bio::EnsEMBL::IO::Parser::VCF4Tabix;

use base qw(Bio::EnsEMBL::VEP::AnnotationSource::File);
Expand Down Expand Up @@ -328,12 +329,28 @@ sub _record_overlaps_VF {
my $self = shift;
my ($vf) = @_;

my $same_type = $self->{same_type};
my $type = $self->type();
my $parser = $self->parser;

# match records based on variant class (if enabled)
my $ref_class = get_SO_term($parser);

# avoid matching breakpoints (start == end) with SNPs in exact mode
my $vf_class = $vf->class_SO_term;
my $is_exact_breakpoint = $type eq 'exact' && defined $vf_class && $vf_class =~ /breakpoint/;

if ($same_type || $is_exact_breakpoint) {
# do not match if only one of the types is defined
return 0 if defined $ref_class xor defined $vf_class;
Comment on lines +344 to +345
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant with the next check as it makes sure both are defined.

Suggested change
# do not match if only one of the types is defined
return 0 if defined $ref_class xor defined $vf_class;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not redundant:

  • The other check only returns 0 if both variables are defined and not equal.
  • If one is defined but the other is not, then they are not equal, and should also return 0. This check is required for this condition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, sorry - I missed that it was an if, you can modify the next condition to have unless -

return 0 unless defined $ref_class && defined $vf_class && $ref_class eq $vf_class;

the condition can only be true if both are defined and equal.


# do not match if both types are not the same
return 0 if defined $ref_class && defined $vf_class && $ref_class ne $vf_class;
}

# we can use the superclass method if overlap type
return $self->SUPER::_record_overlaps_VF(@_)
if ref($vf) eq 'Bio::EnsEMBL::Variation::StructuralVariationFeature';

my $type = $self->type();
my $parser = $self->parser;

if ($type eq 'overlap') {
my $parser_start = $parser->get_raw_start;
Expand Down
3 changes: 1 addition & 2 deletions modules/Bio/EnsEMBL/VEP/BaseVEP.pm
Original file line number Diff line number Diff line change
Expand Up @@ -787,15 +787,14 @@ sub skipped_variant_msg {
my $line = $self->{parser}->{current_block} if defined $self->{parser};
if (defined $line) {
my $maxChar = 50;
$line =~ s/\t/ /g; # replace tabs with spaces to improve readability
$line =~ s/\s+/ /g; # trim tabs and extra spaces to improve readability
$line = substr($line, 0, $maxChar - 4) . "..." if length($line) > $maxChar;
$msg = "(" . $line . "): " . $msg;
}

$msg = (defined $line_number and not defined $self->param('input_data')) ?
"line $line_number skipped " . $msg : "variant skipped " . $msg;
$self->warning_msg("WARNING: " . $msg . "\n");
$self->{skip_line} = 1;

$self->skipped_variants({
line => $line,
Expand Down
10 changes: 6 additions & 4 deletions modules/Bio/EnsEMBL/VEP/Parser.pm
Original file line number Diff line number Diff line change
Expand Up @@ -683,9 +683,6 @@ sub get_SO_term {
$abbrev = $type;
}

## unsupported SV types
$self->skipped_variant_msg("$abbrev type is not supported") if $abbrev eq "CPX";

my %terms = (
INS => 'insertion',
DEL => 'deletion',
Expand All @@ -696,7 +693,12 @@ sub get_SO_term {
BND => 'chromosome_breakpoint'
);

return $terms{$abbrev};
my $res = $terms{$abbrev};
## unsupported SV types
if ($self->isa('Bio::EnsEMBL::VEP::Parser')) {
$self->skipped_variant_msg("$abbrev type is not supported") unless $res;
}
Comment on lines +697 to +700
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not also report skipped variant even if it is ensembl-io parser?

Suggested change
## unsupported SV types
if ($self->isa('Bio::EnsEMBL::VEP::Parser')) {
$self->skipped_variant_msg("$abbrev type is not supported") unless $res;
}
## unsupported SV types - always use vep parser
Bio::EnsEMBL::VEP::Parser->skipped_variant_msg("$abbrev type is not supported") unless $res;

Copy link
Contributor Author

@nuno-agostinho nuno-agostinho Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Nakib! We only use ensembl-io directly for reading custom annotation: this means we will warn the user about the multiple types of variants that are skipped in custom annotation (and these can be a lot of them in some cases). Is this information useful by default or just noise?

What about if it's only printed for custom annotation if using --verbose? Something like:

Suggested change
## unsupported SV types
if ($self->isa('Bio::EnsEMBL::VEP::Parser')) {
$self->skipped_variant_msg("$abbrev type is not supported") unless $res;
}
## unsupported SV types
if ($self->isa('Bio::EnsEMBL::VEP::Parser') || $self->param('verbose')) {
Bio::EnsEMBL::VEP::Parser->skipped_variant_msg("$abbrev type is not supported") unless $res;
}

return $res;
}


Expand Down
10 changes: 8 additions & 2 deletions modules/Bio/EnsEMBL/VEP/Parser/VCF.pm
Original file line number Diff line number Diff line change
Expand Up @@ -480,6 +480,7 @@ sub create_StructuralVariationFeatures {

my $parser = $self->parser();
my $record = $parser->{record};
my $skip_line;

# get relevant data
my ($chr, $start, $end, $alts, $info, $ids) = (
Expand All @@ -494,7 +495,11 @@ sub create_StructuralVariationFeatures {
## get structural variant type from SVTYPE tag (deprecated in VCF 4.4) or ALT
my $alt = join(",", @$alts);
my $type = $info->{SVTYPE} || $alt;
my $so_term = $self->get_SO_term($type) || $type;
my $so_term = $self->get_SO_term($type);
unless ($so_term) {
$skip_line = 1;
$so_term = $type;
}

# work out the end coord
if(defined($info->{END})) {
Expand Down Expand Up @@ -522,6 +527,7 @@ sub create_StructuralVariationFeatures {

if($start >= $end && $so_term =~ /del/i) {
$self->skipped_variant_msg("deletion looks incomplete");
$skip_line = 1;
}

my $svf = Bio::EnsEMBL::Variation::StructuralVariationFeature->new_fast({
Expand All @@ -540,7 +546,7 @@ sub create_StructuralVariationFeatures {
});
$svf->{allele_string} = $allele_string if defined $allele_string;
$svf->{_parsed_allele} = $parsed_allele if defined $parsed_allele;
$svf->{vep_skip} = $self->{skip_line} if defined $self->{skip_line};
$svf->{vep_skip} = $skip_line if defined $skip_line;

return $self->post_process_vfs([$svf]);
}
Expand Down