Replies: 1 comment
-
Hi, thanks for this detailed breakdown. You're correct in identifying that the current inconsistency in the Stoix API regarding truncation and termination flags can lead to confusion. I've been aiming to be more explicit about differentiating termination from truncation flags precisely because their distinction significantly impacts GAE calculations. To clarify, truncation vs. termination does directly affect bootstrapping logic which in turn affects GAE:
Here's a simplified example illustrating why this distinction matters for GAE: Suppose you have a sequence of values:
Given your observations:
I'd love to for us to think more about this and try consolidate and simplify the termination/truncation API but we have to be 100% we are not breaking other calculations. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Above question. Reason being that currently, the 'Stoix API' is not consistent across environments. To give an example. When using a
navix
environment:timestep.last()
returnsTrue
when the transition is either a truncation-transition (i.e. the episode ended due to max_steps), or a termination-transition (i.e. the episode ended due to achieving the goal).PPOTransition.done
isTrue
when the transition is a termination-transitionPPOTransition.truncated
isTrue
when the transition is a truncation-transitionI think this is intended behaviour from the Stoix-side. However, when using a gymnax environment, the behaviour changes to the following:
PPOTransition.done
isTrue
when the transition is a truncation- OR termination-transition (so the same astimestep.last()
, this becomes evident when looking atGymnaxWrapper
, wherediscount
andstep_type
are assigned using the same underlying logic:Stoix/stoix/wrappers/gymnax.py
Lines 89 to 90 in 11dca7d
PPOTransition.truncated
is neverTrue
. Let's look at the code to understand why:Stoix/stoix/systems/ppo/anakin/ff_ppo.py
Line 95 in 11dca7d
To simplify the logic, I am going to replace the variables with descriptive names in pseudocode:
I assume that this is not intended behaviour in Stoix. The issue is that the GymnaxWrapper assigns
step_type
anddiscount
using the same underlying logic, and I assume that the reason you did this is because gymnax does not separately expose truncation and termination in its API (https://github.com/RobertTLange/gymnax/blob/aef77d5c642ea48b95f34c51d05b8417d9450e15/gymnax/environments/classic_control/acrobot.py#L152).As far as I can tell, in
ff_ppo.py
, truncations and dones are only used in the GAE calculation, but since the GAE calculation does not care whether episodes where truncated or terminated, we could removetruncation_flags
entirely, and have the termination/truncation information entirely indiscount_t
. Then, we could discardPPOTransition.truncated
and change the intended information ofPPOTransition.done
from termination-transitions to truncation-or-termination-transitions. For this, we would need to adapt wrappers like NavixWrapper to adhere to the same logic, but it shouldn't be too much work.One caveat here is that I exclusively worked with
ff_ppo.py
, so I am not sure whether other algorithms do treat truncation and termination differently. Please let me know whether the do. If not, then what do you think of removing the distinction betweentruncated
andterminated
Stoix? I could work on this and open a draft PR for theff_ppo
case.Beta Was this translation helpful? Give feedback.
All reactions