Fix Evidence threshold bug #185

vkakerbeck · 2025-02-21T15:10:40Z

There was a bug in the _get_evidence_update_threshold function in the EvidenceLM where instead of setting the threshold for upding a hypothesis to x_percent_threshold it was set to 1 - x_percent_threshold.

Results are as expected:

After the change is applied, accuracy and run time go down (green vs. grey) because we test way fewer hypotheses at every step.
If we set the evidence update threshold manually back to 80% instead of 20%, we get exactly the same results as before the fix (turquoise vs. grey). This is just a sanity check.

Given that the accuracy decreases so much after the fix, it seems like it would be good to create a separate variable for the evidence update threshold (if it is a percentage) instead of using the x_percent_threshold variable defined for the terminal condition. This PR includes a proposed way to add an option to specify evidence_update_threshold as a percentage.

I ran some hyperparameter tests for different values:

Remaining tasks before this can become a full PR:

Test this parameter on 77 object experiment
Test whether there is a different optimal x_percent_threshold value, given that these two parameters as disentangled now
Rerun all benchmarks & update results with the new values (if we set new values. Currently results should be exactly the same).

…reshold

nielsleadholm

The change looks good! Will review properly when it's a full PR but don't see any issue with the updated code.

tristanls · 2025-02-21T15:56:23Z

src/tbp/monty/frameworks/models/evidence_matching.py

+            ), "Percentage must be between 0 and 100"
+            max_global_evidence = self.current_mlh["evidence"]
+            x_percent_of_max = max_global_evidence * (percentage / 100)
+            return max_global_evidence - x_percent_of_max
        elif self.evidence_update_threshold == "x_percent_threshold":


suggestion (non-blocking): Perhaps rename "x_percent_threshold" and the variable to something else? Now that evidence_update_threshold can take an "x percent" threshold value (e.g., "80%"), this name becomes quite confusing to parse.

note: I'm generally confused what x means.

I was planning to do this in a separate PR (as it would touch a lot of the other code). Is that ok?

Sorry, I should have mentioned that it was a non-blocking suggestion. I have no issues with that being a separate PR, just wanted to highlight my confusion.

vkakerbeck · 2025-02-21T19:02:14Z

Just turned this into a real PR as it doesn’t affect the benchmark results. It just fixes the bug. Then we can do a follow-up PR with updating to optimal parameters and updating the benchmark results with those.

nielsleadholm

Looks great, thanks for fixing this so quickly!

hlee9212

LGTM.

vkakerbeck added 2 commits February 21, 2025 16:50

fix evidence update threshold calculation

636afcf

make evidence_update_threshold percentage independent of x_percent_th…

3f004b0

…reshold

vkakerbeck added the bug Something isn't working label Feb 21, 2025

vkakerbeck requested review from nielsleadholm and hlee9212 February 21, 2025 15:11

vkakerbeck assigned nielsleadholm Feb 21, 2025

nielsleadholm reviewed Feb 21, 2025

View reviewed changes

tristanls reviewed Feb 21, 2025

View reviewed changes

vkakerbeck marked this pull request as ready for review February 21, 2025 18:59

hlee9212 mentioned this pull request Feb 21, 2025

Evidence Update Threshold #184

Closed

nielsleadholm approved these changes Feb 24, 2025

View reviewed changes

hlee9212 approved these changes Feb 24, 2025

View reviewed changes

vkakerbeck merged commit a3db6f3 into thousandbrainsproject:main Feb 24, 2025
13 checks passed

nielsleadholm pushed a commit to nielsleadholm/tbp.monty that referenced this pull request Mar 3, 2025

Fix Evidence threshold bug (thousandbrainsproject#185)

6206281

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Evidence threshold bug #185

Fix Evidence threshold bug #185

Uh oh!

vkakerbeck commented Feb 21, 2025

Uh oh!

nielsleadholm left a comment

Uh oh!

tristanls Feb 21, 2025 •

edited

Loading

Uh oh!

vkakerbeck Feb 21, 2025

Uh oh!

tristanls Feb 21, 2025

Uh oh!

vkakerbeck commented Feb 21, 2025

Uh oh!

nielsleadholm left a comment

Uh oh!

hlee9212 left a comment

Uh oh!

Uh oh!

Uh oh!

Fix Evidence threshold bug #185

Fix Evidence threshold bug #185

Uh oh!

Conversation

vkakerbeck commented Feb 21, 2025

Uh oh!

nielsleadholm left a comment

Choose a reason for hiding this comment

Uh oh!

tristanls Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vkakerbeck Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

tristanls Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

vkakerbeck commented Feb 21, 2025

Uh oh!

nielsleadholm left a comment

Choose a reason for hiding this comment

Uh oh!

hlee9212 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tristanls Feb 21, 2025 •

edited

Loading