Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distance-based Features #5

Open
illuminoplanet opened this issue Jul 16, 2020 · 4 comments
Open

Distance-based Features #5

illuminoplanet opened this issue Jul 16, 2020 · 4 comments
Labels
Features Features to be used for classification 구현완료

Comments

@illuminoplanet
Copy link
Collaborator

Align the failure points according to their Euclidean distance from the center. Then extract following features:

  1. Frequency by the interval of distance, 10 bins were made but was cubic interpolated to extract 20 features
  2. Mean
  3. Standard Deviation
  4. Max
  5. Min
  6. Arg max
  7. Arg min

Altogether 26 features were extracted.

@illuminoplanet
Copy link
Collaborator Author

illuminoplanet commented Jul 16, 2020

Implementation

def extract_distance(x):

    feature_name = lambda s, x: f"{s}_{str(x).zfill(2)}"

    coor = np.argwhere(x==2)-(np.array(x.shape)//2)
    radius = np.linalg.norm(coor, ord=2, axis=1)

    dist = {}

    # polar accumulate 
    dist_y, _ = np.histogram(radius)
    dist_x = np.linspace(1, dist_y.size, dist_y.size)

    dist_interpolate = interp1d(dist_x, dist_y, kind='cubic')
    new_dist_x = np.linspace(1, dist_y.size, 20)
    new_dist_y = dist_interpolate(new_dist_x)/np.linspace(1, new_dist_x.size, new_dist_x.size)
    
    for i in range(20):
        dist[feature_name('dist_value', i+1)] = new_dist_y[i]

    dist[feature_name('dist', 'mean')] = np.mean(new_dist_y)
    dist[feature_name('dist', 'std')] = np.std(new_dist_y)
    dist[feature_name('dist', 'max')] = np.max(new_dist_y)
    dist[feature_name('dist', 'min')] = np.min(new_dist_y)
    dist[feature_name('dist', 'argmax')] = np.argmax(new_dist_y)
    dist[feature_name('dist', 'argmin')] = np.argmin(new_dist_y)

    return pd.Series(dist)

@illuminoplanet
Copy link
Collaborator Author

illuminoplanet commented Jul 16, 2020

Evaluation

Control: Density-based + Radon-based + Geometry-based
Estimator 1 (LR) :
          Accuracy : 62.51%
          AUC : 0.9142
Estimator 2 (RF) :
          Accuracy : 80.08%
          AUC : 0.9752
Estimator 3 (GBM) :
          Accuracy : 79.53%
          AUC : 0.9708
Estimator 4 (ANN) :
          Accuracy : 70.12%
          AUC : 0.9402

Experiment: Density-based + Radon-based + Geometry-based + *Distance-based
Estimator 1 (LR) :
          Accuracy : 68.55%
          AUC : 0.9355
Estimator 2 (RF) :
          Accuracy : 81.73%
          AUC : 0.9771
Estimator 3 (GBM) :
          Accuracy : 81.57%
          AUC : 0.9781
Estimator 4 (ANN) :
          Accuracy : 71.92%
          AUC : 0.9475

@illuminoplanet
Copy link
Collaborator Author

Analysis

Showed minor improvement in all estimators and major improvement on the logistic regression classifier.

@ssupecial
Copy link
Collaborator

제 결과표(Evaluation)에 위의 Evaluation을 포함하여 적도록 하겠습니다

@dotoleeoak dotoleeoak added Features Features to be used for classification 구현완료 labels Jul 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Features Features to be used for classification 구현완료
Projects
None yet
Development

No branches or pull requests

3 participants