Skip to content

Feat torchmetrics eval #1071

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

martibosch
Copy link

For #901.

@martibosch
Copy link
Author

this would be a first draft implementing what we discussed in #901.

  • As mentioned there, all boxes are matched regardless of their label, which is in line with the current implementation and makes most sense for a bounding box detection + crop model classification approach. Maybe it would make sense to let the user decide whether matches should consider all boxes or only boxes of the same class (e.g., via a boolean argument) in which case we may discard the latest commit 33cdcc3 and work from there.
  • Also as discussed, torchmetrics offers much more options to control the evaluation. Again this PR mimics the current evaluation API, however it may be a good idea to let users define their custom, e.g., recall thresholds, max detection thresholds and the like. This could be done via keyword arguments whose default values mimic the current evaluation, which would allow advanced users to further customize the evaluation without being too overwhelming for most users.

@bw4sz
Copy link
Collaborator

bw4sz commented Jun 10, 2025

Great, just a quick note that we are merging very soon a very large pre-2.0 push. It won't literally touch anything you done here, but rebasing will take some work since it covers most of the codebase. Hopefully it will be merged within a couple days.

@bw4sz
Copy link
Collaborator

bw4sz commented Jun 11, 2025

Agreed, we can add config args (see new hydra config) to give users more control. Make sure to test for edge cases 1) all empty images, and 2) mixed empty and non-empty images. See empty_frame_accuracy.

self.empty_frame_accuracy = BinaryAccuracy()

excited! Something we have wanted for a long time but not been able to crack.

@martibosch
Copy link
Author

Hello @bw4sz! I have been using my fork and fixed a few issues already (FYI, using this example https://deepforest-modal-app.readthedocs.io/en/latest/treeai-example.html). I will check the edge cases mentioned (I think empty images are indeed problematic with the current code) and push the fixes in this PR in the following days.

Looking forward to DeepForest 2.0.

@bw4sz
Copy link
Collaborator

bw4sz commented Jun 18, 2025

Great, I have identified a multi-gpu evaluate error (on during the training loop), in which the GPU ranks don't properly gather all the data. I'm going to wait until this PR to fix that, we might be able to get rid of our custom code entirely, its slow and i've never loved it.

@bw4sz
Copy link
Collaborator

bw4sz commented Jun 27, 2025

@martibosch can I help here, looks like a quick rebase, and tell me if you are ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants