Test Set Metrics #2000

jramborger78 · 2024-10-21T17:46:30Z

jramborger78
Oct 21, 2024

Hey guys,

Talmo, was a pleasure meeting you in person finally at the Jackson Lab course and always making yourself so available, a true treat for all.

I came back and began putting together a test set of labeled frames, searched through the discussion board, and didn't nail anything down that was in line with what I was mentioning, such as a way to get output metrics that look like that that is produced from training. Essentially, I am after the same bot plot graph and metrics label.

But, I did remember I kind of had ChatGPT help me put something together that was based off the sleap.nn.evals API. I listed it below. Basically, I see that if I change the inputs before the "metrics = " portion when evaluating labels_gt and _pr to be the labels and results of a test inference I could get close to what I am after. But any edits that supplies the training print out or any other useful ways to reprt in a paper how well the model performs is greatly appreciated.

Thanks as always! Cheers.

-Jarryd

import sleap
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

mpl.style.use("seaborn-deep")

# Define the paths
model_path = r"C:\Users\jramb\Documents\Sleap\cocaine\models\centered"
labels_path = r"C:\Users\jramb\Documents\Sleap\cocaine\newcoc.slp"

# Load the model and labels
predictor = sleap.load_model(model_path)
labels_gt = sleap.load_file(labels_path)

# Perform predictions
labels_pr = predictor.predict(labels_gt)

# Evaluate the model
metrics = sleap.nn.evals.evaluate(labels_gt, labels_pr)

# Print evaluation metrics
print("Evaluation Metrics:")
print("Error distance (50%):", metrics["dist.p50"])
print("Error distance (90%):", metrics["dist.p90"])
print("Error distance (95%):", metrics["dist.p95"])
print("mAP:", metrics["oks_voc.mAP"])
print("mAR:", metrics["oks_voc.mAR"])

# Plot localization error distribution
plt.figure(figsize=(6, 3), dpi=150, facecolor="w")
sns.histplot(metrics["dist.dists"].flatten(), binrange=(0, 20), kde=True, kde_kws={"clip": (0, 20)}, stat="probability")
plt.xlabel("Localization error (px)")
plt.title("Localization Error Distribution")
plt.show()

# Plot Object Keypoint Similarity
plt.figure(figsize=(6, 3), dpi=150, facecolor="w")
sns.histplot(metrics["oks_voc.match_scores"].flatten(), binrange=(0, 1), kde=True, kde_kws={"clip": (0, 1)}, stat="probability")
plt.xlabel("Object Keypoint Similarity")
plt.title("Object Keypoint Similarity Distribution")
plt.show()

# Plot Precision-Recall Curve
plt.figure(figsize=(4, 4), dpi=150, facecolor="w")
for precision, thresh in zip(metrics["oks_voc.precisions"][::2], metrics["oks_voc.match_score_thresholds"][::2]):
    plt.plot(metrics["oks_voc.recall_thresholds"], precision, "-", label=f"OKS @ {thresh:.2f}")
plt.xlabel("Recall")
plt.ylabel("Precision")
plt.legend(loc="lower left")
plt.title("Precision-Recall Curve")
plt.show()

jramborger78 · 2024-10-22T00:11:26Z

jramborger78
Oct 22, 2024
Author

Hey guys,

Think I figured it out for the most part. Just not the box plots for nodes. But, Here i what I got so far. My model is 3200 frames, so i used 320 to test (10%) from 16 videos that are from a completely separate cohort so, different rats and images it's never seen. It doesn't look like it was too good. Not sure if it was the images chose or what, feels like I could choose super easy poses and inflate the test, so not sure how to go about that. But in comparison with the training results, makes me feel like it overfitted maybe. That low PCK, as far as I can tell from researching it, is in relation to the levers being predicted, as some videos don't even have them come out, so false negatives maybe or something on ones when it should expect them, especially when as far as i can tell the rest looks good. Not sure.

Test

Training

But it honestly doesnt look bad when i look at videos so I have no idea how to gauge it. Our videos are 800x600, so ~32pxls at its worst is about 4% x 5.3% of the resolution. Again, not too sure how to evaluate or how I'd report the model in a paper as it looks good but this test says its not (to me, at least).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test Set Metrics #2000

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Test Set Metrics #2000

jramborger78 Oct 21, 2024

Replies: 1 comment

jramborger78 Oct 22, 2024 Author

jramborger78
Oct 21, 2024

jramborger78
Oct 22, 2024
Author