Open
Description
I am using property testing in hypothesis to ensure that poranges and bioframe return the exact same results.
This has led me to discover many trifling but annoying bugs.
- When no closest interval is found it throws:
df = bioframe.from_any([['chr1', 100, 110]], name_col='chrom')
bf.closest(df, df.copy(), ignore_overlaps=True)
~/anaconda3/lib/python3.8/site-packages/bioframe/core/arrops.py in closest_intervals(starts1, ends1, starts2, ends2, k, tie_arr, ignore_overlaps, ignore_upstream, ignore_downstream, direction)
734 interval1_run_starts = interval1_run_borders[:-1]
735 interval1_run_ends = interval1_run_borders[1:]
--> 736 closest_ids = closest_ids[
737 arange_multi(
738 interval1_run_starts,
IndexError: index 0 is out of bounds for axis 0 with size 0
Suggested solution (this is how you handle the case where df2 has no overlapping chromosomes with df1):
chrom start end chrom_ start_ end_ distance
0 chr1 100 110 <NA> <NA> <NA> <NA>
- bf.closest does not handle empty dataframes:
df2 = pd.DataFrame({c: pd.Series([], dtype=t) for c, t in df.dtypes.items()})
bf.closest(df2, df)
~/anaconda3/lib/python3.8/site-packages/bioframe/ops.py in _closest_intidxs(df1, df2, k, ignore_overlaps, ignore_upstream, ignore_downstream, direction_col, tie_breaking_col, cols1, cols2)
1020
1021 if len(closest_intidxs) == 0:
-> 1022 return np.ndarray(shape=(0, 2), dtype=np.int)
1023 closest_intidxs = np.vstack(closest_intidxs)
1024
~/anaconda3/lib/python3.8/site-packages/numpy/__init__.py in __getattr__(attr)
282 return Tester
283
--> 284 raise AttributeError("module {!r} has no attribute "
285 "{!r}".format(__name__, attr))
286
AttributeError: module 'numpy' has no attribute 'int'
Suggested solution: return an empty dataframe with the columns from df2 added.
This isn't critical, but it would be nice if you could fix this eventually. Hypothesis ends the testing at the first error found so these bugs prevent me from doing proper testing.
I made the title general because I might update the issue with more bugs as I find them.