[BUG] Fix termination vs truncation mixup

### Describe the bug
Seem like we are not using the correct termination vs truncation values, we're always using the condition `termination or truncation` (`timestep.last()`) when we often want to use the condition of only `termination` (`1 - discount`). It especially tricky in the recurrent systems.

### Expected behavior
What we should do is that when calculating advantages we should use `termination` (`1 - discount`) and in the recurrent systems when passing inputs to the networks during training we should use `termination or truncation` in order to correctly reset the hidden state.

### Possible Solution
Always put `1-discount` in the `PPOTimestep.done` and always put `timestep.last()` in the `RNNLearnerState.done`.

To avoid issues like this in future I think we should rename `RnnLearnerState.done` to `RnnLearnerState.truncation`. 

Looks like there are a couple places where we use `PPOTimestep.done` when it should be `RNNLearnerState.done` so we'd have to go through and make sure we're always using the correct one. An example is [here](https://github.com/instadeepai/Mava/blob/develop/mava/systems/rec_ippo_rware.py#L335) and [here](https://github.com/instadeepai/Mava/blob/develop/mava/systems/rec_ippo_rware.py#L367) where we're using the `PPOTimestep.done` (which should be `1 - discount`) in order to reset the hidden state, instead we should pass in `RnnLearnerState.truncation` to the loss functions and use that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Fix termination vs truncation mixup #951

Describe the bug

Expected behavior

Possible Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Fix termination vs truncation mixup #951

Description

Describe the bug

Expected behavior

Possible Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions