Skip to content

ML function crashes when dataset is too small (I think) #29

Open
@grandjeanlab

Description

@grandjeanlab

I get a crash at the following.
https://github.com/Aswendt-Lab/AIDAqc/blob/87e69818c128f3e8d2ac2489d95b06083751af96/scripts/QC.py#L616h

I think it happens when the dataset is too small. I get the following error message

  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.12/dist-packages/sklearn/base.py", line 1146, in fit_predict
    return self.fit(X, **kwargs).predict(X)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/sklearn/base.py", line 1389, in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/sklearn/covariance/_elliptic_envelope.py", line 183, in fit
    super().fit(X)
  File "/usr/local/lib/python3.12/dist-packages/sklearn/base.py", line 1389, in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/sklearn/covariance/_robust_covariance.py", line 753, in fit
    raw_location, raw_covariance, raw_support, raw_dist = fast_mcd(
                                                          ^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/sklearn/covariance/_robust_covariance.py", line 565, in fast_mcd
    locations_full, covariances_full, supports_full, d = select_candidates(
                                                         ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/sklearn/covariance/_robust_covariance.py", line 336, in select_candidates
    _c_step(
  File "/usr/local/lib/python3.12/dist-packages/sklearn/covariance/_robust_covariance.py", line 136, in _c_step
    support_indices = np.argpartition(dist, n_support - 1)[:n_support]
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/numpy/_core/fromnumeric.py", line 962, in argpartition
    return _wrapfunc(a, 'argpartition', kth, axis=axis, kind=kind, order=order)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/numpy/_core/fromnumeric.py", line 57, in _wrapfunc
    return bound(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^^
ValueError: kth(=2) out of bounds (2)

I got this error running this feature file.

caculated_features_func.csv

code to reproduce the error with my feature file (based on the ML function in QC.py

from sklearn.covariance import EllipticEnvelope   
from sklearn.ensemble import IsolationForest
from sklearn.neighbors import LocalOutlierFactor 
from sklearn.svm import OneClassSVM
import numpy as np
import os
import pandas as pd

csv_path = 'caculated_features_func.csv'
Abook= pd.read_csv(csv_path)
Abook= Abook.dropna(how='all',axis='columns')
Abook= Abook.dropna(how='any')

X =  Abook.iloc[:,7:]

nu = 0.05
gamma = 2.0
clf = OneClassSVM(gamma="auto", kernel="poly", nu=nu,shrinking=False).fit(X)
svm_pre =clf.predict(X)
############## EllipticEnvelope
        
elpenv = EllipticEnvelope(contamination=0.025, random_state=1)
ell_pred = elpenv.fit_predict(X)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions