Feat torchmetrics eval #1071

martibosch · 2025-06-10T05:48:49Z

martibosch · 2025-06-10T14:34:25Z

this would be a first draft implementing what we discussed in #901.

As mentioned there, all boxes are matched regardless of their label, which is in line with the current implementation and makes most sense for a bounding box detection + crop model classification approach. Maybe it would make sense to let the user decide whether matches should consider all boxes or only boxes of the same class (e.g., via a boolean argument) in which case we may discard the latest commit 33cdcc3 and work from there.
Also as discussed, torchmetrics offers much more options to control the evaluation. Again this PR mimics the current evaluation API, however it may be a good idea to let users define their custom, e.g., recall thresholds, max detection thresholds and the like. This could be done via keyword arguments whose default values mimic the current evaluation, which would allow advanced users to further customize the evaluation without being too overwhelming for most users.

bw4sz · 2025-06-10T16:05:36Z

Great, just a quick note that we are merging very soon a very large pre-2.0 push. It won't literally touch anything you done here, but rebasing will take some work since it covers most of the codebase. Hopefully it will be merged within a couple days.

bw4sz · 2025-06-11T17:59:00Z

Agreed, we can add config args (see new hydra config) to give users more control. Make sure to test for edge cases 1) all empty images, and 2) mixed empty and non-empty images. See empty_frame_accuracy.

DeepForest/src/deepforest/main.py

Line 97 in af9458b

self.empty_frame_accuracy = BinaryAccuracy()

excited! Something we have wanted for a long time but not been able to crack.

martibosch · 2025-06-18T04:55:29Z

Hello @bw4sz! I have been using my fork and fixed a few issues already (FYI, using this example https://deepforest-modal-app.readthedocs.io/en/latest/treeai-example.html). I will check the edge cases mentioned (I think empty images are indeed problematic with the current code) and push the fixes in this PR in the following days.

Looking forward to DeepForest 2.0.

bw4sz · 2025-06-18T16:55:50Z

Great, I have identified a multi-gpu evaluate error (on during the training loop), in which the GPU ranks don't properly gather all the data. I'm going to wait until this PR to fix that, we might be able to get rid of our custom code entirely, its slow and i've never loved it.

bw4sz · 2025-06-27T14:30:07Z

@martibosch can I help here, looks like a quick rebase, and tell me if you are ready for review.

martibosch added 4 commits June 2, 2025 18:08

feat: torchmetrics eval WIP

0970f34

fix: __evaluate_wrapper__ takes label to numeric dict

4069c1a

feat: remove stale code after using torchmetrics eval

eba6d5a

feat: class-independent box matching, then compute class recall

33cdcc3

martibosch added 2 commits June 16, 2025 07:09

fix: index-based prediction-truth mapping, avoid modifying preds

dd4dc18

fix: bug in omitting pred labels to compute class recall/precision

fdb8fed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat torchmetrics eval #1071

Feat torchmetrics eval #1071

Uh oh!

martibosch commented Jun 10, 2025

Uh oh!

martibosch commented Jun 10, 2025

Uh oh!

bw4sz commented Jun 10, 2025

Uh oh!

bw4sz commented Jun 11, 2025

Uh oh!

martibosch commented Jun 18, 2025

Uh oh!

bw4sz commented Jun 18, 2025

Uh oh!

bw4sz commented Jun 27, 2025

Uh oh!

Uh oh!

Feat torchmetrics eval #1071

Are you sure you want to change the base?

Feat torchmetrics eval #1071

Uh oh!

Conversation

martibosch commented Jun 10, 2025

Uh oh!

martibosch commented Jun 10, 2025

Uh oh!

bw4sz commented Jun 10, 2025

Uh oh!

bw4sz commented Jun 11, 2025

Uh oh!

martibosch commented Jun 18, 2025

Uh oh!

bw4sz commented Jun 18, 2025

Uh oh!

bw4sz commented Jun 27, 2025

Uh oh!

Uh oh!