Distance-based Features #5

illuminoplanet · 2020-07-16T01:18:42Z

Align the failure points according to their Euclidean distance from the center. Then extract following features:

Frequency by the interval of distance, 10 bins were made but was cubic interpolated to extract 20 features
Mean
Standard Deviation
Max
Min
Arg max
Arg min

Altogether 26 features were extracted.

illuminoplanet · 2020-07-16T01:21:22Z

Implementation

def extract_distance(x):

    feature_name = lambda s, x: f"{s}_{str(x).zfill(2)}"

    coor = np.argwhere(x==2)-(np.array(x.shape)//2)
    radius = np.linalg.norm(coor, ord=2, axis=1)

    dist = {}

    # polar accumulate 
    dist_y, _ = np.histogram(radius)
    dist_x = np.linspace(1, dist_y.size, dist_y.size)

    dist_interpolate = interp1d(dist_x, dist_y, kind='cubic')
    new_dist_x = np.linspace(1, dist_y.size, 20)
    new_dist_y = dist_interpolate(new_dist_x)/np.linspace(1, new_dist_x.size, new_dist_x.size)
    
    for i in range(20):
        dist[feature_name('dist_value', i+1)] = new_dist_y[i]

    dist[feature_name('dist', 'mean')] = np.mean(new_dist_y)
    dist[feature_name('dist', 'std')] = np.std(new_dist_y)
    dist[feature_name('dist', 'max')] = np.max(new_dist_y)
    dist[feature_name('dist', 'min')] = np.min(new_dist_y)
    dist[feature_name('dist', 'argmax')] = np.argmax(new_dist_y)
    dist[feature_name('dist', 'argmin')] = np.argmin(new_dist_y)

    return pd.Series(dist)

illuminoplanet · 2020-07-16T01:30:56Z

Evaluation

Control: Density-based + Radon-based + Geometry-based
Estimator 1 (LR) :
          Accuracy : 62.51%
          AUC : 0.9142
Estimator 2 (RF) :
          Accuracy : 80.08%
          AUC : 0.9752
Estimator 3 (GBM) :
          Accuracy : 79.53%
          AUC : 0.9708
Estimator 4 (ANN) :
          Accuracy : 70.12%
          AUC : 0.9402

Experiment: Density-based + Radon-based + Geometry-based + *Distance-based
Estimator 1 (LR) :
          Accuracy : 68.55%
          AUC : 0.9355
Estimator 2 (RF) :
          Accuracy : 81.73%
          AUC : 0.9771
Estimator 3 (GBM) :
          Accuracy : 81.57%
          AUC : 0.9781
Estimator 4 (ANN) :
          Accuracy : 71.92%
          AUC : 0.9475

illuminoplanet · 2020-07-16T01:38:48Z

Analysis

Showed minor improvement in all estimators and major improvement on the logistic regression classifier.

ssupecial · 2020-07-17T05:02:20Z

제 결과표(Evaluation)에 위의 Evaluation을 포함하여 적도록 하겠습니다

dotoleeoak added Features Features to be used for classification 구현완료 labels Jul 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distance-based Features #5

Distance-based Features #5

illuminoplanet commented Jul 16, 2020

illuminoplanet commented Jul 16, 2020 •

edited by dotoleeoak

Loading

illuminoplanet commented Jul 16, 2020 •

edited

Loading

illuminoplanet commented Jul 16, 2020

ssupecial commented Jul 17, 2020

Distance-based Features #5

Distance-based Features #5

Comments

illuminoplanet commented Jul 16, 2020

illuminoplanet commented Jul 16, 2020 • edited by dotoleeoak Loading

illuminoplanet commented Jul 16, 2020 • edited Loading

illuminoplanet commented Jul 16, 2020

ssupecial commented Jul 17, 2020

illuminoplanet commented Jul 16, 2020 •

edited by dotoleeoak

Loading

illuminoplanet commented Jul 16, 2020 •

edited

Loading