v0.1
a 4 parameters observation space :
- current outdoor temperature
- current indoor temperature
- tc * occupation
- nb hours -> occupation change (from occupied to empty and vice versa)
tc is the indoor setpoint temperature
Training in non occupation mode (vacancy) is interesting. Annyway, due to the design of the observation space, the network is long to find tc, which slows down the learning process. Moreover, nothing was planned to train with various tc.
multiply tc by occupation was a very bad choice
the observation space has to be redesigned.