ML function crashes when dataset is too small (I think)

I get a crash at the following. 
https://github.com/Aswendt-Lab/AIDAqc/blob/87e69818c128f3e8d2ac2489d95b06083751af96/scripts/QC.py#L616h

I think it happens when the dataset is too small. I get the following error message 

```{python}
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.12/dist-packages/sklearn/base.py", line 1146, in fit_predict
    return self.fit(X, **kwargs).predict(X)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/sklearn/base.py", line 1389, in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/sklearn/covariance/_elliptic_envelope.py", line 183, in fit
    super().fit(X)
  File "/usr/local/lib/python3.12/dist-packages/sklearn/base.py", line 1389, in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/sklearn/covariance/_robust_covariance.py", line 753, in fit
    raw_location, raw_covariance, raw_support, raw_dist = fast_mcd(
                                                          ^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/sklearn/covariance/_robust_covariance.py", line 565, in fast_mcd
    locations_full, covariances_full, supports_full, d = select_candidates(
                                                         ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/sklearn/covariance/_robust_covariance.py", line 336, in select_candidates
    _c_step(
  File "/usr/local/lib/python3.12/dist-packages/sklearn/covariance/_robust_covariance.py", line 136, in _c_step
    support_indices = np.argpartition(dist, n_support - 1)[:n_support]
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/numpy/_core/fromnumeric.py", line 962, in argpartition
    return _wrapfunc(a, 'argpartition', kth, axis=axis, kind=kind, order=order)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/numpy/_core/fromnumeric.py", line 57, in _wrapfunc
    return bound(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^^
ValueError: kth(=2) out of bounds (2)
```

I got this error running this feature file. 

[caculated_features_func.csv](https://github.com/user-attachments/files/20495405/caculated_features_func.csv)


code to reproduce the error with my feature file (based on the ML function in QC.py 

```{python}
from sklearn.covariance import EllipticEnvelope   
from sklearn.ensemble import IsolationForest
from sklearn.neighbors import LocalOutlierFactor 
from sklearn.svm import OneClassSVM
import numpy as np
import os
import pandas as pd

csv_path = 'caculated_features_func.csv'
Abook= pd.read_csv(csv_path)
Abook= Abook.dropna(how='all',axis='columns')
Abook= Abook.dropna(how='any')

X =  Abook.iloc[:,7:]

nu = 0.05
gamma = 2.0
clf = OneClassSVM(gamma="auto", kernel="poly", nu=nu,shrinking=False).fit(X)
svm_pre =clf.predict(X)
############## EllipticEnvelope
        
elpenv = EllipticEnvelope(contamination=0.025, random_state=1)
ell_pred = elpenv.fit_predict(X)

```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ML function crashes when dataset is too small (I think) #29

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ML function crashes when dataset is too small (I think) #29

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions