You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, I really liked this environment, and I believe it is a nice benchmark for MORL algorithms! :)
I was checking the readme and two questions came to my mind:
I noticed the observation does not include the positions which already contain a station. Isn't this information necessary for the agent to infer the optimal policy/optimal q-values? The value of a state depends on whether the agent can still put more stations or not, for example.
Wouldn't it make sense for the agent to choose to not put a station in a cell? Then it could create a line with more spaced stations. This would be easy to change by creating an additional action for putting a station in a cell.
The text was updated successfully, but these errors were encountered:
Thank you so much for your kind words and insightful comments. I really appreciate the effort!
Addressing your points:
You are absolutely right about the current state representation. It was originally designed with a simpler environment in mind, but it no longer aligns. I am revising it to include both the positions containing stations and the agent’s current location. I'll comment here once I implement it.
This is a great suggestion. However, I think implementing this would require some additional experiments to incorporate the constraints. As it stands, the agent’s movement is restricted by the action mask, but introducing an extra action for station placement would essentially allow for free movement. We might need to adjust how constraints are calculated in that case. So maybe something to look into in the future.
Hi @dimichai,
First of all, I really liked this environment, and I believe it is a nice benchmark for MORL algorithms! :)
I was checking the readme and two questions came to my mind:
I noticed the observation does not include the positions which already contain a station. Isn't this information necessary for the agent to infer the optimal policy/optimal q-values? The value of a state depends on whether the agent can still put more stations or not, for example.
Wouldn't it make sense for the agent to choose to not put a station in a cell? Then it could create a line with more spaced stations. This would be easy to change by creating an additional action for putting a station in a cell.
The text was updated successfully, but these errors were encountered: