-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Moving ORF
as an AbstractGenomicInterval{T}
#34
Conversation
You don't really need the I'm not very good at performance chasing, but the concept seems like a good one to me. IntervalTrees make a lot of sense for genomic features, especially something like ORFs over a sequence that may overlap. Hopefully, someone else can pitch in suggestions for chasing down the regression. Or worst case, post on discourse saying that python is faster and get dozens of responses 😅 |
I did not get this suggestion, is this completely literal
Yeah, for me it was like a very nice abstraction to create a more specific
Good idea! I will wai a little bit to get it completely working with the remaining methods, but I agree I will get a storm of suggestions! |
Yeah
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I have now a more consistent and efficient (benches updated in the first comment) ORF type:
struct ORF{N,F} <: GenomicFeatures.AbstractGenomicInterval{F}
groupname::String
first::Int64
last::Int64
strand::Strand
frame::Int
seq::LongSubSeq{DNAAlphabet{N}}
features::Features
scheme::Union{Nothing,Function}
end
And as you suggested @kescobo, I removed the finder::F
from the main struct, and make an outer constructor that acually takes a ::Type{F}
so that it is now posssible to make new finder
such that they will be <: GeneFinderMethod
abstract and will populate the ORF{F}
.
Another fixed thing I have not seen before is that the ORF{N,F}
can handle LongSubSeq{DNAAlphabet{N}} where N=2
.
Also, I was trying to make ORF{N,F}
, so now it can be loaded with gene features since the Features
type is a flexible Features(fts::NamedTuple) = Features{typeof(fts)}(fts)
. The score and many other thigs we can get from an ORF can be saved in the type.
A couple of demonstrations of how it is now very neat:
phi = dna"GTGTGAGGTTATAACGCCGAAGCGGTAAAAATTTTAATTTTTGCCGCTGAGGGGTTGACCAAGCGAAGCGCGGTAGGTTTTCTGCTTAGGAGTTTAATCATGTTTCAGACTTTTATTTCTCGCCATAATTCAAACTTTTTTTCTGATAAGCTGGTTCTCACTTCTGTTACTCCAGCTTCTTCGGCACCTGTTTTACAGACACCTAAAGCTACATCGTCAACGTTATATTTTGATAGTTTGACGGTTAATGCTGGTAATGGTGGTTTTCTTCATTGCATTCAGATGGATACATCTGTCAACGCCGCTAATCAGGTTGTTTCTGTTGGTGCTGATATTGCTTTTGATGCCGACCCTAAATTTTTTGCCTGTTTGGTTCGCTTTGAGTCTTCTTCGGTTCCGACTACCCTCCCGACTGCCTATGATGTTTATCCTTTGAATGGTCGCCATGATGGTGGTTATTATACCGTCAAGGACTGTGTGACTATTGACGTCCTTCCCCGTACGCCGGGCAATAACGTTTATGTTGGTTTCATGGTTTGGTCTAACTTTACCGCTACTAAATGCCGCGGATTGGTTTCGCTGAATCAGGTTATTAAAGAGATTATTTGTCTCCAGCCACTTAAGTGAGGTGATTTATGTTTGGTGCTATTGCTGGCGGTATTGCTTCTGCTCTTGCTGGTGGCGCCATGTCTAAATTGTTTGGAGGCGGTCAAAAAGCCGCCTCCGGTGGCATTCAAGGTGATGTGCTTGCTACCGATAACAATACTGTAGGCATGGGTGATGCTGGTATTAAATCTGCCATTCAAGGCTCTAATGTTCCTAACCCTGATGAGGCCGCCCCTAGTTTTGTTTCTGGTGCTATGGCTAAAGCTGGTAAAGGACTTCTTGAAGGTACGTTGCAGGCTGGCACTTCTGCCGTTTCTGATAAGTTGCTTGATTTGGTTGGACTTGGTGGCAAGTCTGCCGCTGATAAAGGAAAGGATACTCGTGATTATCTTGCTGCTGCATTTCCTGAGCTTAATGCTTGGGAGCGTGCTGGTGCTGATGCTTCCTCTGCTGGTATGGTTGACGCCGGATTTGAGAATCAAAAAGAGCTTACTAAAATGCAACTGGACAATCAGAAAGAGATTGCCGAGATGCAAAATGAGACTCAAAAAGAGATTGCTGGCATTCAGTCGGCGACTTCACGCCAGAATACGAAAGACCAGGTATATGCACAAAATGAGATGCTTGCTTATCAACAGAAGGAGTCTACTGCTCGCGTTGCGTCTATTATGGAAAACACCAATCTTTCCAAGCAACAGCAGGTTTCCGAGATTATGCGCCAAATGCTTACTCAAGCTCAAACGGCTGGTCAGTATTTTACCAATGACCAAATCAAAGAAATGACTCGCAAGGTTAGTGCTGAGGTTGACTTAGTTCATCAGCAAACGCAGAATCAGCGGTATGGCTCTTCTCATATTGGCGCTACTGCAAAGGATATTTCTAATGTCGTCACTGATGCTGCTTCTGGTGTGGTTGATATTTTTCATGGTATTGATAAAGCTGTTGCCGATACTTGGAACAATTTCTGGAAAGACGGTAAAGCTGATGGTATTGGCTCTAATTTGTCTAGGAAATAACCGTCAGGATTGACACCCTCCCAATTGTATGTTTTCATGCCTCCAAATCTTGGAGGCTTTTTTATGGTTCGTTCTTATTACCCTTCTGAATGTCACGCTGATTATTTTGACTTTGAGCGTATCGAGGCTCTTAAACCTGCTATTGAGGCTTGTGGCATTTCTACTCTTTCTCAATCCCCAATGCTTGGCTTCCATAAGCAGATGGATAACCGCATCAAGCTCTTGGAAGAGATTCTGTCTTTTCGTATGCAGGGCGTTGAGTTCGATAATGGTGATATGTATGTTGACGGCCATAAGGCTGCTTCTGACGTTCGTGATGAGTTTGTATCTGTTACTGAGAAGTTAATGGATGAATTGGCACAATGCTACAATGTGCTCCCCCAACTTGATATTAATAACACTATAGACCACCGCCCCGAAGGGGACGAAAAATGGTTTTTAGAGAACGAGAAGACGGTTACGCAGTTTTGCCGCAAGCTGGCTGCTGAACGCCCTCTTAAGGATATTCGCGATGAGTATAATTACCCCAAAAAGAAAGGTATTAAGGATGAGTGTTCAAGATTGCTGGAGGCCTCCACTATGAAATCGCGTAGAGGCTTTGCTATTCAGCGTTTGATGAATGCAATGCGACAGGCTCATGCTGATGGTTGGTTTATCGTTTTTGACACTCTCACGTTGGCTGACGACCGATTAGAGGCGTTTTATGATAATCCCAATGCTTTGCGTGACTATTTTCGTGATATTGGTCGTATGGTTCTTGCTGCCGAGGGTCGCAAGGCTAATGATTCACACGCCGACTGCTATCAGTATTTTTGTGTGCCTGAGTATGGTACAGCTAATGGCCGTCTTCATTTCCATGCGGTGCACTTTATGCGGACACTTCCTACAGGTAGCGTTGACCCTAATTTTGGTCGTCGGGTACGCAATCGCCGCCAGTTAAATAGCTTGCAAAATACGTGGCCTTATGGTTACAGTATGCCCATCGCAGTTCGCTACACGCAGGACGCTTTTTCACGTTCTGGTTGGTTGTGGCCTGTTGATGCTAAAGGTGAGCCGCTTAAAGCTACCAGTTATATGGCTGTTGGTTTCTATGTGGCTAAATACGTTAACAAAAAGTCAGATATGGACCTTGCTGCTAAAGGTCTAGGAGCTAAAGAATGGAACAACTCACTAAAAACCAAGCTGTCGCTACTTCCCAAGAAGCTGTTCAGAATCAGAATGAGCCGCAACTTCGGGATGAAAATGCTCACAATGACAAATCTGTCCACGGAGTGCTTAATCCAACTTACCAAGCTGGGTTACGACGCGACGCCGTTCAACCAGATATTGAAGCAGAACGCAAAAAGAGAGATGAGATTGAGGCTGGGAAAAGTTACTGTAGCCGACGTTTTGGCGGCGCAACCTGTGACGACAAATCTGCTCAAATTTATGCGCGCTTCGATAAAAATGATTGGCGTATCCAACCTGCAGAGTTTTATCGCTTCCATGACGCAGAAGTTAACACTTTCGGATATTTCTGATGAGTCGAAAAATTATCTTGATAAAGCAGGAATTACTACTGCTTGTTTACGAATTAAATCGAAGTGGACTGCTGGCGGAAAATGAGAAAATTCGACCTATCCTTGCGCAGCTCGAGAAGCTCTTACTTTGCGACCTTTCGCCATCAACTAACGATTCTGTCAAAAACTGACGCGTTGGATGAGGAGAAGTGGCTTAATATGCTTGGCACGTTCGTCAAGGACTGGTTTAGATATGAGTCACATTTTGTTCATGGTAGAGATTCTCTTGTTGACATTTTAAAAGAGCGTGGATTACTATCTGAGTCCGATGCTGTTCAACCACTAATAGGTAAGAAATCATGAGTCAAGTTACTGAACAATCCGTACGTTTCCAGACCGCTTTGGCCTCTATTAAGCTCATTCAGGCTTCTGCCGTTTTGGATTTAACCGAAGATGATTTCGATTTTCTGACGAGTAACAAAGTTTGGATTGCTACTGACCGCTCTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTACGCTGGACTTTGTGGGATACCCTCGCTTTCCTGCTCCTGTTGAGTTTATTGCTGCCGTCATTGCTTATTATGTTCATCCCGTCAACATTCAAACGGCCTGTCTCATCATGGAAGGCGCTGAATTTACGGAAAACATTATTAATGGCGTCGAGCGTCCGGTTAAAGCCGCTGAATTGTTCGCGTTTACCTTGCGTGTACGCGCAGGAAACACTGACGTTCTTACTGACGCAGAAGAAAACGTGCGTCAAAAATTACGTGCGGAAGGAGTGATGTAATGTCTAAAGGTAAAAAACGTTCTGGCGCTCGCCCTGGTCGTCCGCAGCCGTTGCGAGGTACTAAAGGCAAGCGTAAAGGCGCTCGTCTTTGGTATGTAGGTGGTCAACAATTTTAATTGCAGGGGCTTCGGCCCCTTACTTGAGGATAAATTATGTCTAATATTCAAACTGGCGCCGAGCGTATGCCGCATGACCTTTCCCATCTTGGCTTCCTTGCTGGTCAGATTGGTCGTCTTATTACCATTTCAACTACTCCGGTTATCGCTGGCGACTCCTTCGAGATGGACGCCGTTGGCGCTCTCCGTCTTTCTCCATTGCGTCGTGGCCTTGCTATTGACTCTACTGTAGACATTTTTACTTTTTATGTCCCTCATCGTCACGTTTATGGTGAACAGTGGATTAAGTTCATGAAGGATGGTGTTAATGCCACTCCTCTCCCGACTGTTAACACTACTGGTTATATTGACCATGCCGCTTTTCTTGGCACGATTAACCCTGATACCAATAAAATCCCTAAGCATTTGTTTCAGGGTTATTTGAATATCTATAACAACTATTTTAAAGCGCCGTGGATGCCTGACCGTACCGAGGCTAACCCTAATGAGCTTAATCAAGATGATGCTCGTTATGGTTTCCGTTGCTGCCATCTCAAAAACATTTGGACTGCTCCGCTTCCTCCTGAGACTGAGCTTTCTCGCCAAATGACGACTTCTACCACATCTATTGACATTATGGGTCTGCAAGCTGCTTATGCTAATTTGCATACTGACCAAGAACGTGATTACTTCATGCAGCGTTACCATGATGTTATTTCTTCATTTGGAGGTAAAACCTCTTATGACGCTGACAACCGTCCTTTACTTGTCATGCGCTCTAATCTCTGGGCATCTGGCTATGATGTTGATGGAACTGACCAAACGTCGTTAGGCCAGTTTTCTGGTCGTGTTCAACAGACCTATAAACATTCTGTGCCGCGTTTCTTTGTTCCTGAGCATGGCACTATGTTTACTCTTGCGCTTGTTCGTTTTCCGCCTACTGCGACTAAAGAGATTCAGTACCTTAACGCTAAAGGTGCTTTGACTTATACCGATATTGCTGGCGACCCTGTTTTGTATGGCAACTTGCCGCCGCGTGAAATTTCTATGAAGGATGTTTTCCGTTCTGGTGATTCGTCTAAGAAGTTTAAGATTGCTGAGGGTCAGTGGTATCGTTATGCGCCTTCGTATGTTTCTCCTGCTTATCACCTTCTTGAAGGCTTCCCATTCATTCAGGAACCGCCTTCTGGTGATTTGCAAGAACGCGTACTTATTCGCCACCATGATTATGACCAGTGTTTCCAGTCCGTTCAGTTGTTGCAGTGGAATAGTCAGGTTAAATTTAATGTGACCGTTTATCGCAATCTGCCGACCACTCGCGATTCAATCATGACTTCGTGATAAAAGATTGA"
phiorfs = findorfs(phi, finder=NaiveFinder)
204-element Vector{ORF{4, NaiveFinder}}:
ORF{NaiveFinder}(9:101, '-', 3)
ORF{NaiveFinder}(94:126, '-', 1)
ORF{NaiveFinder}(100:627, '+', 1)
ORF{NaiveFinder}(209:214, '-', 2)
ORF{NaiveFinder}(223:273, '-', 1)
ORF{NaiveFinder}(223:447, '-', 1)
ORF{NaiveFinder}(245:292, '-', 2)
ORF{NaiveFinder}(248:436, '+', 2)
ORF{NaiveFinder}(257:436, '+', 2)
ORF{NaiveFinder}(270:278, '-', 3)
ORF{NaiveFinder}(283:627, '+', 1)
ORF{NaiveFinder}(344:436, '+', 2)
ORF{NaiveFinder}(419:436, '+', 2)
ORF{NaiveFinder}(422:436, '+', 2)
ORF{NaiveFinder}(437:481, '+', 2)
⋮
ORF{NaiveFinder}(4863:5258, '-', 3)
ORF{NaiveFinder}(4933:5019, '+', 1)
ORF{NaiveFinder}(4941:5375, '+', 3)
ORF{NaiveFinder}(5053:5085, '+', 1)
ORF{NaiveFinder}(5082:5375, '+', 3)
ORF{NaiveFinder}(5089:5325, '+', 1)
ORF{NaiveFinder}(5122:5202, '-', 1)
ORF{NaiveFinder}(5152:5325, '+', 1)
ORF{NaiveFinder}(5164:5325, '+', 1)
ORF{NaiveFinder}(5180:5206, '-', 2)
ORF{NaiveFinder}(5257:5325, '+', 1)
ORF{NaiveFinder}(5263:5325, '+', 1)
ORF{NaiveFinder}(5320:5325, '+', 1)
ORF{NaiveFinder}(5330:5365, '-', 2)
ORF{NaiveFinder}(5364:5375, '+', 3)
Since the ORF{N,F}
stores the sequence, we can make cool things out of the box:
sequence.(orfs)
204-element Vector{LongSubSeq{DNAAlphabet{4}}}:
ATGATTAAACTCCTAAGCAGAAAACCTACCGCGCTTCGC…GCGGCAAAAATTAAAATTTTTACCGCTTCGGCGTTATAA
ATGGCGAGAAATAAAAGTCTGAAACATGATTAA
ATGTTTCAGACTTTTATTTCTCGCCATAATTCAAACTTT…GTTATTAAAGAGATTATTTGTCTCCAGCCACTTAAGTGA
ATGTAG
ATGAAGAAAACCACCATTACCAGCATTAACCGTCAAACTATCAAAATATAA
ATGGCGACCATTCAAAGGATAAACATCATAGGCAGTCGG…ACCATTACCAGCATTAACCGTCAAACTATCAAAATATAA
ATGTATCCATCTGAATGCAATGAAGAAAACCACCATTACCAGCATTAA
ATGCTGGTAATGGTGGTTTTCTTCATTGCATTCAGATGG…CGACTACCCTCCCGACTGCCTATGATGTTTATCCTTTGA
ATGGTGGTTTTCTTCATTGCATTCAGATGGATACATCTG…CGACTACCCTCCCGACTGCCTATGATGTTTATCCTTTGA
ATGCAATGA
ATGGATACATCTGTCAACGCCGCTAATCAGGTTGTTTCT…GTTATTAAAGAGATTATTTGTCTCCAGCCACTTAAGTGA
ATGCCGACCCTAAATTTTTTGCCTGTTTGGTTCGCTTTG…CGACTACCCTCCCGACTGCCTATGATGTTTATCCTTTGA
ATGATGTTTATCCTTTGA
ATGTTTATCCTTTGA
ATGGTCGCCATGATGGTGGTTATTATACCGTCAAGGACTGTGTGA
⋮
ATGGTGGCGAATAAGTACGCGTTCTTGCAAATCACCAGA…TTTATAGGTCTGTTGAACACGACCAGAAAACTGGCCTAA
ATGGCACTATGTTTACTCTTGCGCTTGTTCGTTTTCCGC…CTAAAGAGATTCAGTACCTTAACGCTAAAGGTGCTTTGA
ATGTTTACTCTTGCGCTTGTTCGTTTTCCGCCTACTGCG…AATCTGCCGACCACTCGCGATTCAATCATGACTTCGTGA
ATGGCAACTTGCCGCCGCGTGAAATTTCTATGA
ATGAAGGATGTTTTCCGTTCTGGTGATTCGTCTAAGAAG…AATCTGCCGACCACTCGCGATTCAATCATGACTTCGTGA
ATGTTTTCCGTTCTGGTGATTCGTCTAAGAAGTTTAAGA…AGTTGTTGCAGTGGAATAGTCAGGTTAAATTTAATGTGA
ATGGGAAGCCTTCAAGAAGGTGATAAGCAGGAGAAACAT…AAGGCGCATAACGATACCACTGACCCTCAGCAATCTTAA
ATGCGCCTTCGTATGTTTCTCCTGCTTATCACCTTCTTG…AGTTGTTGCAGTGGAATAGTCAGGTTAAATTTAATGTGA
ATGTTTCTCCTGCTTATCACCTTCTTGAAGGCTTCCCAT…AGTTGTTGCAGTGGAATAGTCAGGTTAAATTTAATGTGA
ATGAATGGGAAGCCTTCAAGAAGGTGA
ATGATTATGACCAGTGTTTCCAGTCCGTTCAGTTGTTGCAGTGGAATAGTCAGGTTAAATTTAATGTGA
ATGACCAGTGTTTCCAGTCCGTTCAGTTGTTGCAGTGGAATAGTCAGGTTAAATTTAATGTGA
ATGTGA
ATGATTGAATCGCGAGTGGTCGGCAGATTGCGATAA
ATGACTTCGTGA
Exploring the new Features
field
We can now try to use a scoring scheme, for instance, the simple log_odds_ratio_score
and use minlen
keyword argument:
findorfs(phi, scheme=lors, minlen=64) .|> features
134-element Vector{@NamedTuple{score::Float64}}:
(score = -0.020953839808134023,)
(score = -0.0008346917390085617,)
(score = -0.005239249252611444,)
(score = -0.007200284900850932,)
(score = -0.007832152171979367,)
(score = -0.0018172720089990923,)
(score = -0.006404578260605308,)
(score = -0.003209054988434599,)
(score = -0.03100933596797633,)
(score = -0.0005844727783277565,)
(score = -0.0005442693515613672,)
(score = -0.02946967703459084,)
(score = -0.0006814154969336216,)
(score = -0.0009388058672044588,)
(score = -0.000946341251223099,)
⋮
(score = -0.0003438480564352011,)
(score = -0.00853670881288625,)
(score = -0.00802605474396142,)
(score = -0.01604090594572921,)
(score = -0.00037344639883859457,)
(score = -0.011635555210909566,)
(score = -0.0006769082407312228,)
(score = -0.014804098291466597,)
(score = -0.0006454376565976915,)
(score = -0.0014012196428411425,)
(score = -0.0024182335566292662,)
(score = -0.01466675827643835,)
(score = -0.00302236305890491,)
(score = -0.003913378336104148,)
(score = -0.030410038701528405,)
Very neet interface for filtering
And much more we can now filter with very nice fine grain and can close #33 :
filter(x -> iscoding(sequence(x), η=1e-10) && length(x) > 100, findorfs(phi))
25-element Vector{ORF{4, NaiveFinder}}:
ORF{NaiveFinder}(1021:1389, '+', 1)
ORF{NaiveFinder}(1447:1635, '+', 1)
ORF{NaiveFinder}(1489:1635, '+', 1)
ORF{NaiveFinder}(1501:1635, '+', 1)
ORF{NaiveFinder}(1531:1635, '+', 1)
ORF{NaiveFinder}(2697:3227, '+', 3)
ORF{NaiveFinder}(2745:3227, '+', 3)
ORF{NaiveFinder}(2780:3142, '+', 2)
ORF{NaiveFinder}(2841:3227, '+', 3)
ORF{NaiveFinder}(2859:3227, '+', 3)
ORF{NaiveFinder}(2865:3227, '+', 3)
ORF{NaiveFinder}(2874:3227, '+', 3)
ORF{NaiveFinder}(2973:3227, '+', 3)
ORF{NaiveFinder}(3142:3312, '+', 1)
ORF{NaiveFinder}(3481:3939, '+', 1)
ORF{NaiveFinder}(3659:3934, '+', 2)
ORF{NaiveFinder}(3734:3934, '+', 2)
ORF{NaiveFinder}(3772:3939, '+', 1)
ORF{NaiveFinder}(3806:3934, '+', 2)
ORF{NaiveFinder}(4129:4287, '+', 1)
ORF{NaiveFinder}(4160:4291, '-', 2)
ORF{NaiveFinder}(4540:4644, '+', 1)
ORF{NaiveFinder}(4690:4866, '+', 1)
ORF{NaiveFinder}(4741:4866, '+', 1)
ORF{NaiveFinder}(4744:4866, '+', 1)
Using GenomeFeature GenomeIntervalCollections
Since the ORF{N,F}
is a subtype of a AbstractGenomicInterval
we can use some nice methods and explore, for instance, the coverage of the ORF intersections:
orfcolleciton = GenomicIntervalCollection(phiorfs)
GenomicIntervalCollection{ORF{4, NaiveFinder}} with 134 intervals:
ORF{NaiveFinder}(9:101, '-', 3)
ORF{NaiveFinder}(100:627, '+', 1)
ORF{NaiveFinder}(223:447, '-', 1)
ORF{NaiveFinder}(248:436, '+', 2)
ORF{NaiveFinder}(257:436, '+', 2)
ORF{NaiveFinder}(283:627, '+', 1)
ORF{NaiveFinder}(344:436, '+', 2)
ORF{NaiveFinder}(532:627, '+', 1)
⋮
coverage(orfcollection)
GenomicIntervalCollection{GenomicInterval{UInt32}} with 178 intervals:
phi:9-99 . 1
phi:100-101 . 2
phi:102-222 . 1
phi:223-247 . 2
phi:248-256 . 3
phi:257-282 . 4
phi:283-343 . 5
phi:344-436 . 6
⋮
…uires a new version of BMC.
This looks fantastic :-) |
Thanks for all the suggestions! I still doubt whether to keep the |
Are SubSeqs implemented as views? To me, that would make the most sense, so they could be carried along with the object, but not be reallocated |
Yes, it is |
Some time ago, I discovered the GenomicFeatures.jl package. It has a generic abstract type
AbstractGenomicInterval{T}
that has some neat features as it is build on the even more generic data structure IntervalTrees.jl. Back in the beginning ofGeneFinder.jl
I did not understand several important aspects of Julia and the nature of having a robust data structure for an ORF.Now, watching the imp. of the more concrete
GenomicInterval
:It got me thinking that it is indeed a somehow generic
ORF
although simpler with only starting, ending, and strand fields that could be shared, it could expand later on the metadata or with more dedicated fields. However the cool stuff with thisAbstractGenomicInterval{T}
is that it inherits the nice properties of theIntervalTree{K,V}
data structure so that several operations of an interval (which is indeed an ORF location) can be not only conveniently implemented (as they are already in that package) but also fast. For instance, finding intersections of intervals (see details). This is something that can be important for different gene-finder algorithms that rely on filtering based on overlapping ORFs. I can also be missing some other details that this data structure could bring to theORF
to be more robust and generic to build other finder methods upon.This PR implements the
ORF
as a subtype of theAbstractGenomicInterval{F} where {F <: GeneFinderMethod}
:Current benchmarks do not look very promising compared to the previous simple implemented
ORF
:There are other imp. details on the
ORF{F}
like the addition of thefinder::F
field and the other fields below, that are also making more generic the implementations, but currently not necessarily affecting the benchmarks, and those could be added directly to oldORF
struct.