Handle NaNs #360

jo-mueller · 2025-01-07T15:34:49Z

When using the algorithm widgets, there is no check for NaNs in the processed dataframes. To handle these correctly, NaN rows should be removed before passing the data onto the respective algorithm. Likewise, the NaN rows should be added again after the processing or napari will complain that the number of features in the input/outputs doesn't match.

zoccoler · 2025-03-03T14:18:28Z

how about we implement this in a minor version prior to v0.9.0? I think this would already be useful for the next workshop, I am getting errors with the dimensionality reduction algorithms sometimes because of this.

jo-mueller · 2025-03-03T21:17:36Z

It's a bit strange because this problem was encountered before and also fixed by #70 so I'm not quite sure why the algorithms fail sometimes... 🤔

zoccoler · 2025-03-05T14:38:17Z

I am using version 0.8.1.

I managed to make a MWE:

Draw a label with a single pixel (and a couple others larger if you like) and measure all features with napari-skimage-regionprops. You will end up with a table with aspect_ratio = np.nan, and roundness and circularity = np.inf.

Then run UMAP with default parameters.

I got the errors below and napari shuts itself down after a few seconds.

napari_clusters_plotter\_dimensionality_reduction.py:489: UserWarning: These features contain inf values: ['roundness', 'circularity']. They will be excluded from the analysis.!
C:\Users\mazo260d\miniforge3\envs\tim25\lib\site-packages\umap\umap_.py:1952: UserWarning: n_jobs value 1 overridden to 1 by setting random_state. Use no seed for parallelism.!
C:\Users\mazo260d\miniforge3\envs\tim25\lib\site-packages\umap\umap_.py:2462: UserWarning: n_neighbors is larger than the dataset size; truncating to X.shape[0] - 1!
C:\Users\mazo260d\miniforge3\envs\tim25\lib\site-packages\umap\spectral.py:519: RuntimeWarning: k >= N for N * N square matrix. Attempting to use scipy.linalg.eigh instead.!

The inf part seems OK, so I am guessing the problem is with the NaN?

jo-mueller · 2025-03-05T19:42:32Z

@zoccoler Can not reproduce, unfortunately. This is my test code:

import numpy as np
import napari
import pandas as pd

import napari_clusters_plotter as ncp
ncp.__version__

labels = np.zeros((100, 100), dtype=int)
labels[:1, :1] = 1
labels[10:15, 10:15] = 2
labels[20:25, 20:25] = 3
labels[30:35, 30:35] = 4
labels[40:45, 40:45] = 5
labels[50:55, 50:55] = 6
labels[60:65, 60:65] = 7

features = pd.DataFrame({
    'label': [1, 2, 3, 4, 5, 6, 7],
    'feature1': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7],
    'feature2': [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0],
    'feature3': [0.7, np.nan, 0.9, 1.0, 1.1, 1.2, 1.3]
})

viewer = napari.Viewer()
viewer.add_labels(labels, name='labels', features=features)

When I run a UMAP on this data, the result looks just fine. If I add np.inf to the data, then the workflow still works - which it shouldn't. My susspicion here would be that the problem is that np.nans and np.infs are handled in two different places, namely here and here. If I combine infs and nans into the features dataframe, it still works, though. Not sure what causes the error.

Edit: Version is also 0.8.1

zoccoler · 2025-03-06T09:56:09Z

I ran your code and it works, but if I get the measurements using napari-skimage-regionprops (no intensity and moments in this case) and then run UMAP on all features, I get some warnings and the UMAP columns never show up.

c:\Users\mazo260d\Documents\GitHub\napari-clusters-plotter\napari_clusters_plotter\_dimensionality_reduction.py:489: UserWar
ning: These features contain inf values: ['roundness', 'circularity']. They will be excluded from the analysis.!
C:\Users\mazo260d\miniforge3\envs\tim25\lib\site-packages\sklearn\utils\extmath.py:1101: RuntimeWarning: invalid value encou
ntered in divide!
C:\Users\mazo260d\miniforge3\envs\tim25\lib\site-packages\sklearn\utils\extmath.py:1106: RuntimeWarning: invalid value encou
ntered in divide!
C:\Users\mazo260d\miniforge3\envs\tim25\lib\site-packages\sklearn\utils\extmath.py:1126: RuntimeWarning: invalid value encou
ntered in divide!

Or this:

c:\Users\mazo260d\Documents\GitHub\napari-clusters-plotter\napari_clusters_plotter\_dimensionality_reduction.py:489: UserWar
ning: These features contain inf values: ['roundness', 'circularity']. They will be excluded from the analysis.!
C:\Users\mazo260d\miniforge3\envs\tim25\lib\site-packages\umap\umap_.py:1952: UserWarning: n_jobs value 1 overridden to 1 by
 setting random_state. Use no seed for parallelism.!
C:\Users\mazo260d\miniforge3\envs\tim25\lib\site-packages\umap\umap_.py:2462: UserWarning: n_neighbors is larger than the da
taset size; truncating to X.shape[0] - 1!
C:\Users\mazo260d\miniforge3\envs\tim25\lib\site-packages\umap\umap_.py:134: UserWarning: A large number of your vertices we
re disconnected from the manifold.
Disconnection_distance = inf has removed 0 edges.
It has fully disconnected 2 vertices.
You might consider using find_disconnected_points() to find and remove these points from your data.
Use umap.utils.disconnected_vertices() to identify them.!

It is somehow inconsistent though, it does not always fail. One problem could be running UMAP with way more columns/features than rows.

I can't identify fully the problem, so let's not change now to check if this really becomes an issue in the near future.

I like the catching NaN decorator approach, maybe we could blend inf handling the same way.
One problem I notice is if we have a whole column with NaNs.

jo-mueller · 2025-03-06T11:33:33Z

I like the catching NaN decorator approach, maybe we could blend inf handling the same way.

I tried that and I think we run into problems if we try to handle np.inf twice. If we put it into the decorator, then the decorator should also handle the StandardScaler and remove the if-clause that currently does the checking.

jo-mueller added the bug Something isn't working label Jan 7, 2025

jo-mueller added this to the v0.9.0 milestone Jan 7, 2025

jo-mueller mentioned this issue Jan 7, 2025

Handle nans correctly #362

Merged

zoccoler removed this from the v0.9.0 milestone Mar 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handle NaNs #360

Handle NaNs #360

jo-mueller commented Jan 7, 2025

zoccoler commented Mar 3, 2025

Uh oh!

jo-mueller commented Mar 3, 2025

Uh oh!

zoccoler commented Mar 5, 2025

Uh oh!

jo-mueller commented Mar 5, 2025 •

edited

Loading

Uh oh!

zoccoler commented Mar 6, 2025

Uh oh!

jo-mueller commented Mar 6, 2025

Uh oh!

Handle NaNs #360

Handle NaNs #360

Comments

jo-mueller commented Jan 7, 2025

zoccoler commented Mar 3, 2025

Uh oh!

jo-mueller commented Mar 3, 2025

Uh oh!

zoccoler commented Mar 5, 2025

Uh oh!

jo-mueller commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zoccoler commented Mar 6, 2025

Uh oh!

jo-mueller commented Mar 6, 2025

Uh oh!

jo-mueller commented Mar 5, 2025 •

edited

Loading