-
-
Notifications
You must be signed in to change notification settings - Fork 425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug Report] TerminateIllegalWrapper breaks agent selection #1176
Comments
The problem here is that env variables, including agent_selection, are set by calls from TerminateIllegalWrapper to env functions. However, they are called by the wrapper object, not the env so they are set in the wrapper object rather than the base env object. When the code later tries to run, the values get updated in the env code, but the wrapper pulls it's own values that shadow them. There are several ways to fix this. I think the most reliable and robust fix is to ensure that base wrapper has all methods from AECEnv and redirects them to be called by the env. An alternate option is to change TerminateIllegalWrapper to call the method from the unwrapped env, but that is less general because it relies on other wrappers to do that. Alternatively, agent_selection and other values can be set as properties in BaseWrapper, but that requires more items added to BaseWrapper. |
(for later reference when testing the fix)
Output
|
Adding this ensures that variables get set to the appropriate location. By default, any public value set by the wrapper is sent to the env to be set there instead of on the wrapper. If a variable is meant to be set by the wrapper, it should be listed in the _local_vars class variable of the wrapper. This is not ideal, but seems to be the most reasonable design. An example of needing to specify which vars to keep locally is here: https://python-patterns.guide/gang-of-four/decorator-pattern/#implementing-dynamic-wrapper The solution is to list which vars should be in the wrapper and check them when setting a value. That is the approach used in this commit, but more generalized. In line with __getattr__, private values cannot be set on underlying envs. There are two exceptions: _cumulative_rewards was previously exempted in __getattr__ because it is used by many envs. _skip_agent_selection is added because is used byt the dead step handling. If a wrapper can't set this, that functionality will break.
I think there should be a good way to fix this, will discuss in the other PR or make a new one |
Thanks for the detailed code we can adapt this as a test case to ensure it doesn’t break in the future |
Describe the bug
When running an env using TerminateIllegalWrapper and not using the action mask, the agent selection becomes corrupted when an illegal move is made.
Here is an example from tictactoe (code below). Notice in the first game that player 1 starts (as expected) and it alternates between players 1 and 2 (as expected) until player 1 makes an illegal move, which is caught by the wrapper.
However, in the second game, player 1 makes two moves in a row. That should not happen. Also note that the illegal move flagged is not actually illegal per the game rules.
This behaviour has been reported for other games.
Code example
System info
Additional context
No response
Checklist
The text was updated successfully, but these errors were encountered: