Skip to content

Migrate to BOPTEST's Gym interface #47

@mattrobmattrob

Description

@mattrobmattrob

Unify around https://github.com/ibpsa/project1-boptest-gym for the Gym interface within ACTB. This will likely include adding some functionality to the BOPTEST Gym code while sussing out what needs to be in the Gym interface or the ACTB client and so on.

@SouravDaedo noted the current differences:

  • The actb-gym is built around the actb-client which uses the Alfalfa simulation.
  • In project1-boptest-gym, the reward function cannot be customized – currently it is just a fixed function of the sum of cost and thermal discomfort – coming from the KPIs.
  • The actb-gym also provides option to formulate the reward function only from the KPIs of the controlled zones
  • In project1-boptest-gym, it is difficult to test and adjust relative weights.
  • In actb-gym, the relative weights for the individual KPIs can be easily adjusted and this enables us to do price based optimization using time-of-use pricing and real-time pricing, normalize the different KPIs to adjust the difference in magnitude of the units of the KPIs.
  • The actb-gym has provisions to easily set up a demand limiting function into the rewards – as well as form observation states providing information on the demand response. These states are countdown to the start of the demand response, binary indicators, whether current or forecasted time in t hours, lies with the demand limiting timeframe.
  • In actb-gym, the normalization of the observation states is done a bit differently where a user specifies the min and max limit instead of coming from the BOPTEST spawn model. This helps in convergence as in BOPTEST the ranges are wide and if we are only performing a summer or winter simulation, we know that the ranges are much narrower. This helps the agent to learn faster for a specific case. In actb-gym, the user can just specify the n_obs=True in the initialization of the environment to get the states between 0 and 1.
  • DiscretizedObservationWrapper and ActionWrapper implementation is not there in the actb-gym – it only always provides continuous state-variables
  • The actb-gym has capabilities to store data for individual variables with the store_data function which helps to save the information (apart from states, actions, rewards, next states, it provides the contribution of each KPIs to the reward function, temperatures and power consumption from each zone). This helps in debugging by looking at the csv at the end of each training episode. This also helps to plot the variables easily instead of manually storing and saving the variables outside of the
  • In actb-gym, the gym has an internal checking mechanism that if while using RL, the heating setpoint is suggested to set a temperature higher than the cooling setpoint, the cooling setpoint is overridden to have a higher temperature than the heating setpoint and vice-versa for the cooling setpoint so that both heating and cooling does not occur at the same time.
  • In actb-gym, the step function returns next_states, action, rewards, done, info. The info here provides information about the individual KPI contribution to the reward function
  • In actb-gym, there is a inherent function to plot the temperature and power responses effectively
  • The actb-client only allows for 300 second as actb uses Alfalfa where the timestep cannot be changed. This is not a Gym issue, but wanted to highlight this.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions