Assorted bugs and possibly undefined behavior in closest

I am using property testing in [hypothesis](https://hypothesis.readthedocs.io/en/latest/) to ensure that [poranges](https://github.com/endrebak/poranges) and bioframe return the exact same results.

This has led me to discover many trifling but annoying bugs.

1. When no closest interval is found it throws:

```
df = bioframe.from_any([['chr1', 100, 110]], name_col='chrom')
bf.closest(df, df.copy(), ignore_overlaps=True)
~/anaconda3/lib/python3.8/site-packages/bioframe/core/arrops.py in closest_intervals(starts1, ends1, starts2, ends2, k, tie_arr, ignore_overlaps, ignore_upstream, ignore_downstream, direction)
    734     interval1_run_starts = interval1_run_borders[:-1]
    735     interval1_run_ends = interval1_run_borders[1:]
--> 736     closest_ids = closest_ids[
    737         arange_multi(
    738             interval1_run_starts,

IndexError: index 0 is out of bounds for axis 0 with size 0
```

*Suggested solution* (this is how you handle the case where df2 has no overlapping chromosomes with df1): 

```
  chrom  start  end chrom_  start_  end_  distance
0  chr1    100  110   <NA>    <NA>  <NA>      <NA>
```

2. bf.closest does not handle empty dataframes:

```
df2 = pd.DataFrame({c: pd.Series([], dtype=t) for c, t in df.dtypes.items()})
bf.closest(df2, df)
~/anaconda3/lib/python3.8/site-packages/bioframe/ops.py in _closest_intidxs(df1, df2, k, ignore_overlaps, ignore_upstream, ignore_downstream, direction_col, tie_breaking_col, cols1, cols2)
   1020
   1021     if len(closest_intidxs) == 0:
-> 1022         return np.ndarray(shape=(0, 2), dtype=np.int)
   1023     closest_intidxs = np.vstack(closest_intidxs)
   1024

~/anaconda3/lib/python3.8/site-packages/numpy/__init__.py in __getattr__(attr)
    282             return Tester
    283
--> 284         raise AttributeError("module {!r} has no attribute "
    285                              "{!r}".format(__name__, attr))
    286

AttributeError: module 'numpy' has no attribute 'int'
```

*Suggested solution*: return an empty dataframe with the columns from df2 added.

-------

This isn't critical, but it would be nice if you could fix this eventually. Hypothesis ends the testing at the first error found so these bugs prevent me from doing proper testing.

I made the title general because I might update the issue with more bugs as I find them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Assorted bugs and possibly undefined behavior in closest #167

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Assorted bugs and possibly undefined behavior in closest #167

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions