Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate interval tree from cgranges to superintervals #42
base: main
Are you sure you want to change the base?
Migrate interval tree from cgranges to superintervals #42
Changes from 3 commits
8ebed61
25201bd
641a51e
a63c492
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
This file was deleted.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't
.get()
already returnNone
as the default empty value?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I tend to prefer explicit
None
s in this kind of use case and changed it without thinking much. I can revert if this is an issue.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, I don't really have an opinion. Explicit is nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new method isn't much different in behavior than
get_overlaps
so I don't think we need both publicly available. Although this one looks lazy, it isn't.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Responding to @clintval and @msto in the same place since you made similar comments:
__hash__
is concernced).get_overlaps
without breaking the API.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd usually favor including those as options in
get_overlaps()
instead of having a separate public method.e.g.
I agree with Clint that returning an
Iterator
implies lazy evaluation. If the method can be lazy, I think our usual pattern is to return an iterator and allow the user to convert tolist
at the call site. Otherwise, the method should returnlist
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call to
reversed
won't make this lazy despite its use in a generator.Are you doing this to preserve original order?
IMHO order wouldn't matter to me.
Or if it did, I'd want the same order as is forced in
get_overlaps()
further down.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put the call to
reversed
to yield them in insertion order. TheIntervalSet
returns a list in reverse-insertion order, andreversed
iterates backwards through the list without copying, so basically no overhead. I don't feel strongly, the point of this method was to have a lightweight method for yielding all the intervals without concern for order.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious about the
set
here - how doesIntervalSet
handle duplicate intervals?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IntervalSet
will only be storing indices into the intervals list that theOverlapDetector
stores. So it won't even be aware of any exact duplicates (i.e. even the index is the same). As for intervals with duplicate start and stop positions, it stores them correctly.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will depend on the hash of the object you place in the
OverlapDetector
, since the overlap detector is generic now.If you send in a dataclass with
frozen=True
, then all fields will be used as a part of the hash. However, if you make a custom interval-like object and define your own hash method, then that hash method will be used.Both
BedRecord
andInterval
hash on all their fields.Mypy should disallow any custom interval-like object that does not have a
__hash__()
dunder method:pybedlite/pybedlite/overlap_detector.py
Line 82 in a63c492
I don't think a call to
set
orsorted
should have been included in the original implementation (a user should have the choice of "unique-ifying" later, or sorting later) but because they are already a part of the implementation details, it makes sense to preserve behavior:pybedlite/pybedlite/overlap_detector.py
Lines 362 to 374 in 9c4990e