Skip to content

Commit

Permalink
New script to move feature from features of a same record (agat_sp_mo…
Browse files Browse the repository at this point in the history
…ve_attributes_within_records) (#413)

* New script to move attributes from features of a same record, add test and doc

* fix create_or_append_tag in Omniscient Tool to take more than the first value when the attribute has several values

---------

Co-authored-by: Jacques Dainat <[email protected]>
  • Loading branch information
Juke34 and Juke34 authored Jan 10, 2024
1 parent 33292e0 commit b8f36f6
Show file tree
Hide file tree
Showing 8 changed files with 504 additions and 6 deletions.
407 changes: 407 additions & 0 deletions bin/agat_sp_move_attributes_within_records.pl

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ Contents
tools/agat_sp_manage_functional_annotation.md
tools/agat_sp_manage_introns.md
tools/agat_sp_merge_annotations.md
tools/agat_sp_move_attributes_within_records
tools/agat_sp_prokka_fix_fragmented_gene_annotations.md
tools/agat_sp_sensitivity_specificity.md
tools/agat_sp_separate_by_record_type.md
Expand Down
67 changes: 67 additions & 0 deletions docs/tools/agat_sp_move_attributes_within_records.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# NAME

agat\_sp\_move\_attributes\_within\_records.pl

# DESCRIPTION

The script aims to keep move attributes within a record e.g. from Level1 to Level2 and/or Level3 features; and / or from Level2 to Level2 or Level3 features; and / or from Level3 to Level3 features.
Example of L1 feature: gene
Example of L2 featrue

# SYNOPSIS

```
agat_sp_move_attributes_within_records.pl --gff infile.gff --feature_copy mRNA --feature_paste CDS --attribute Dbxref,Ontology [ --output outfile ]
agat_sp_move_attributes_within_records.pl --help
```

# OPTIONS

- **-f**, **--reffile**, **--gff** or **-ref**

Input GFF3 file that will be read

- **--feature\_copy** or **--fc**

primary tag (feature type) option to list from which feature we will copy the attributes, case insensitive.
You can specified a feature (or a coma separated list) by giving its primary tag / feature type (column 3) value as: cds, Gene, MrNa, etc
You can specify directly all the feature of a particular level:
level2=mRNA,ncRNA,tRNA,etc
level3=CDS,exon,UTR,etc
By default all level2 feature are used.

- **--feature\_paste** or **--fp**

primary tag (feature type) option to list to which feature we will paste the attributes, case sensitive.
You can specified a feature (or a coma separated list) by giving its primary tag / feature type (column 3) value as: cds, Gene, MrNa, etc
You can specify directly all the feature of a particular level:
level2=mRNA,ncRNA,tRNA,etc
level3=CDS,exon,UTR,etc
By default all feature level3 are used.

- **-a** or **--attribute**

Attribute that will be copied and pasted. Case sensitive.
You can specified an attribute (or a coma separated list) by giving its attribute tag value (column9) as: Ontology, Dbxref, etc
Default: all\_attributes
/!\\ &lt;all\_attributes> is a specific parameter meaning all the attributes will be use.

- **-o** or **--output**

Output GFF file. If no output file is specified, the output will be
written to STDOUT.

- **-v**

Verbose option for debugging purpose.

- **-c** or **--config**

String - Input agat config file. By default AGAT takes as input agat\_config.yaml file from the working directory if any,
otherwise it takes the orignal agat\_config.yaml shipped with AGAT. To get the agat\_config.yaml locally type: "agat config --expose".
The --config option gives you the possibility to use your own AGAT config file (located elsewhere or named differently).

- **-h** or **--help**

Display this helpful text.

15 changes: 10 additions & 5 deletions lib/AGAT/OmniscientTool.pm
Original file line number Diff line number Diff line change
Expand Up @@ -571,7 +571,8 @@ sub merge_overlap_loci{
my @list_tag_l2 = $omniscient->{'level1'}{$tag_l1}{$id2_l1}->get_all_tags();
foreach my $tag (@list_tag_l2){
if(lc($tag) ne "parent" and lc($tag) ne "id"){
create_or_append_tag($omniscient->{'level1'}{$tag_l1}{$id_l1}, $tag ,$omniscient->{'level1'}{$tag_l1}{$id2_l1}->get_tag_values($tag));
my @tag_values = $omniscient->{'level1'}{$tag_l1}{$id2_l1}->get_tag_values($tag);
create_or_append_tag($omniscient->{'level1'}{$tag_l1}{$id_l1}, $tag , \@tag_values);
}
}
# remove the level1 of the ovelaping one
Expand Down Expand Up @@ -611,7 +612,8 @@ sub merge_overlap_loci{
$resume_identic++;
my @list_tag_l2 = $common->get_all_tags();
foreach my $tag (@list_tag_l2){
create_or_append_tag($kept_l2, "merged_".$tag ,$common->get_tag_values($tag));
my @tag_values = $common->get_tag_values($tag);
create_or_append_tag($kept_l2, "merged_".$tag , \@tag_values);
}
}
}
Expand Down Expand Up @@ -1325,6 +1327,9 @@ sub create_or_replace_tag{

# INPUT: feature object, String tag, String or Array ref;
# Output: None
# /!\ If values are extracted using get_tag_values($tag) you should first save the result in an array and send the array ref to this function e.g
# my @tag_values = $feature->get_tag_values($tag);
# create_or_append_tag($other_feature, $tag , \@tag_values);
sub create_or_append_tag{
my ($feature, $tag, $value)=@_;

Expand All @@ -1333,15 +1338,15 @@ sub create_or_append_tag{
my @original_values = $feature->get_tag_values($tag);
foreach my $value (@{$value}){
if(! grep { $value eq $_ } @original_values){
$feature->add_tag_value($tag,@{$value});
$feature->add_tag_value($tag,$value);
}
}
}
else{
my @original_values = $feature->get_tag_values($tag);
my @original_values = $feature->get_tag_values($tag);
if(! grep { $value eq $_ } @original_values){
$feature->add_tag_value($tag,$value);
}
}
}
}
else{
Expand Down
2 changes: 1 addition & 1 deletion t/gff_syntax/out/15_correct_output.gff
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@ scaffold789 maker match_part 558184 560123 . + . ID=agat-exon-4;Parent=CLUHART00
scaffold789 maker match_part 561401 561519 . + . ID=agat-exon-5;Parent=CLUHART00000006146;Target=CLUHART00000006146 1941 2059;merged_ID=CLUHART00000006146:exon:996;merged_Parent=CLUHART00000006147;merged_Target=CLUHART00000006147 1941 2059
scaffold789 maker match_part 562057 562121 . + . ID=CLUHART00000006147:exon:999;Parent=CLUHART00000006146;Target=CLUHART00000006147 2060 2124
scaffold789 maker match_part 564171 564235 . + . ID=CLUHART00000006146:exon:997;Parent=CLUHART00000006146;Target=CLUHART00000006146 2060 2124
scaffold789 maker match_part 564372 564780 . + . ID=agat-exon-6;Parent=CLUHART00000006146;Target=CLUHART00000006146 2125 2533;merged_ID=CLUHART00000006146:exon:998;merged_Parent=CLUHART00000006147;merged_Target=CLUHART00000006147
scaffold789 maker match_part 564372 564780 . + . ID=agat-exon-6;Parent=CLUHART00000006146;Target=CLUHART00000006146 2125 2533;merged_ID=CLUHART00000006146:exon:998;merged_Parent=CLUHART00000006147;merged_Target=CLUHART00000006147,2125,2533
9 changes: 9 additions & 0 deletions t/scripts_output.t
Original file line number Diff line number Diff line change
Expand Up @@ -603,6 +603,15 @@ system(" $script --gff $input_folder/agat_sp_merge_annotations/fileA.gff --gff
ok( system("diff $result $outtmp") == 0, "output $script");
unlink $outtmp;

# ------------------- check agat_sp_move_attributes_within_records script-------------------

$script = $script_prefix."bin/agat_sp_move_attributes_within_records.pl";
$result = "$output_folder/agat_sp_move_attributes_within_records.gff";
system(" $script --gff $input_folder/agat_sp_move_attributes_within_records.gff --fp exon,CDS --fc mRNA -o $outtmp 2>&1 1>/dev/null");
#run test
ok( system("diff $result $outtmp") == 0, "output $script");
unlink $outtmp;

# ------------------- check agat_sp_prokka_fragmented_gene_annotations script-------------------

$script = $script_prefix."bin/agat_sp_prokka_fix_fragmented_gene_annotations.pl";
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
ptg000002l AUGUSTUS mRNA 3255 4626 0.5 + . ID=NBISM00000000001;Parent=NBISG00000000001;Dbxref=CDD:cd07067,Gene3D:G3DSA:3.40.50.1240,InterPro:IPR013078,InterPro:IPR029033,;Name=ARB_03491;Ontology_term=-;makerName=g1.t1;product=Probable phosphoglycerate mutase ARB_03491;uniprot_id=D4B4V1
ptg000002l AUGUSTUS CDS 3255 3275 0.98 + 0 ID=NBISC00000000001;Parent=NBISM00000000001;makerName=g1.t1.CDS1

Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
##gff-version 3
ptg000002l AGAT gene 3255 4626 . + . ID=NBISG00000000001;Dbxref=CDD:cd07067,Gene3D:G3DSA:3.40.50.1240,InterPro:IPR013078,InterPro:IPR029033;Name=ARB_03491;Ontology_term=-;makerName=g1.t1;product=Probable phosphoglycerate mutase ARB_03491;uniprot_id=D4B4V1
ptg000002l AUGUSTUS mRNA 3255 4626 0.5 + . ID=NBISM00000000001;Parent=NBISG00000000001;Dbxref=CDD:cd07067,Gene3D:G3DSA:3.40.50.1240,InterPro:IPR013078,InterPro:IPR029033;Name=ARB_03491;Ontology_term=-;makerName=g1.t1;product=Probable phosphoglycerate mutase ARB_03491;uniprot_id=D4B4V1
ptg000002l AGAT exon 3255 4626 . + . ID=agat-exon-1;Parent=NBISM00000000001;Dbxref=CDD:cd07067,Gene3D:G3DSA:3.40.50.1240,InterPro:IPR013078,InterPro:IPR029033;Name=ARB_03491;Ontology_term=-;makerName=g1.t1.CDS1,g1.t1;product=Probable phosphoglycerate mutase ARB_03491;uniprot_id=D4B4V1
ptg000002l AUGUSTUS CDS 3255 3275 0.98 + 0 ID=NBISC00000000001;Parent=NBISM00000000001;Dbxref=CDD:cd07067,Gene3D:G3DSA:3.40.50.1240,InterPro:IPR013078,InterPro:IPR029033;Name=ARB_03491;Ontology_term=-;makerName=g1.t1.CDS1,g1.t1;product=Probable phosphoglycerate mutase ARB_03491;uniprot_id=D4B4V1
ptg000002l AGAT three_prime_UTR 3276 4626 . + . ID=agat-three_prime_utr-1;Parent=NBISM00000000001;makerName=g1.t1.CDS1

0 comments on commit b8f36f6

Please sign in to comment.