[Question] Is `ActionBonus` wrapper correct ? #402

riiswa · 2023-09-20T23:24:00Z

ActionBonus is a wrapper that adds an exploration bonus to less visited (state,action) pairs. Regarding the code source, we make the transition to the new state s_{t+1} before updating the counter of (s_{t+1}, a_t). Shouldn't we update the counter of (s_t, a_t) instead (i.e. perform the step in the environment after the update)?

I may have misunderstood the purpose of the wrapper, but depending on your response I may submit a PR.

I have checked that there is no similar issue in the repo (required)

The text was updated successfully, but these errors were encountered:

turbotimon · 2024-04-30T15:53:34Z

It seems correct to me. At least in the current main.

What may confuses is, that it does the +1 before calculating, but that's needed because the formula requires a start from 1. We could therefore make pre_count=1 and the +1 after the calculation, but the outcome is the same.

(Disclaimer: I'm not a maintainer of this repo)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Is `ActionBonus` wrapper correct ? #402

[Question] Is `ActionBonus` wrapper correct ? #402

riiswa commented Sep 20, 2023

turbotimon commented Apr 30, 2024

[Question] Is ActionBonus wrapper correct ? #402

[Question] Is ActionBonus wrapper correct ? #402

Comments

riiswa commented Sep 20, 2023

turbotimon commented Apr 30, 2024

[Question] Is `ActionBonus` wrapper correct ? #402

[Question] Is `ActionBonus` wrapper correct ? #402