Skip to content

Commit 9a43547

Browse files
committed
nits: modify language about distance-reward to match the implementation (distance at start of game vs. start of time-step). change global-reward to just reward, since there's no longer a local-reward. Also, add note about moving right incurring a negative reward.
1 parent eeaf94e commit 9a43547

File tree

1 file changed

+6
-7
lines changed

1 file changed

+6
-7
lines changed

pettingzoo/butterfly/pistonball/pistonball.py

+6-7
Original file line numberDiff line numberDiff line change
@@ -31,10 +31,9 @@
3131
**Actions**: Every piston can be acted on at each time step. In discrete mode, the action space is 0 to move down by 4 pixels, 1 to stay still, and 2 to move up by 4 pixels. In continuous mode, the value in the range [-1, 1] is proportional to the amount that the pistons
3232
are lowered or raised by. Continuous actions are scaled by a factor of 4, so that in both the discrete and continuous action space, the action 1 will move pistons 4 pixels up, and -1 will move pistons 4 pixels down.
3333
34-
**Rewards**: The same reward is provided to each agent based on how much the ball moved left in the last time-step plus a constant time-penalty. Specifically, there are three components to the distance reward. First, the x-distance in pixels travelled by the ball towards
35-
the left-wall in the last time-step (moving right would provide a negative reward). Second, a scaling factor of 100. Third, a division by the distance in pixels between the ball at the start of the time-step and the left-wall. That final division component means moving
36-
one unit left when close to the wall is far more valuable than moving one unit left when far from the wall. There is also a configurable time-penalty (default: -0.1) added to the distance-based reward at each time-step. For example, if the ball does not move in a
37-
time-step, the reward will be -0.1 not 0. This is to incentivize solving the game faster.
34+
**Rewards**: The same reward is provided to each agent based on how much the ball moved left in the last time-step (moving right results in a negative reward) plus a constant time-penalty. The distance component is the percentage of the initial total distance (i.e. at game-start)
35+
to the left-wall travelled in the past timestep. For example, if the ball began the game 300 pixels away from the wall, began the time-step 180 pixels away and finished the time-step 175 pixels away, the distance reward would be 100 * 5/300 = 1.7. There is also a configurable
36+
time-penalty (default: -0.1) added to the distance-based reward at each time-step. For example, if the ball does not move in a time-step, the reward will be -0.1 not 0. This is to incentivize solving the game faster.
3837
3938
Pistonball uses the chipmunk physics engine, and are thus the physics are about as realistic as in the game Angry Birds.
4039
@@ -632,15 +631,15 @@ def step(self, action):
632631
# The negative one is included since the x-axis increases from left-to-right. And, if the x
633632
# position decreases we want the reward to be positive, since the ball would have gotten closer
634633
# to the left-wall.
635-
global_reward = (
634+
reward = (
636635
-1
637636
* (ball_curr_pos - self.ball_prev_pos)
638637
* (100 / self.distance_to_wall_at_game_start)
639638
)
640639
if not self.terminate:
641-
global_reward += self.time_penalty
640+
reward += self.time_penalty
642641

643-
self.rewards = {agent: global_reward for agent in self.agents}
642+
self.rewards = {agent: reward for agent in self.agents}
644643
self.ball_prev_pos = ball_curr_pos
645644
self.frames += 1
646645
else:

0 commit comments

Comments
 (0)