This project is about training an agent to escape "slippery" labyrinths. The environment is a grid-based maze where the agent experiences "slippery" dynamics: intended moves have an 85% chance of success and a 15% chance of moving to a random adjacent unblocked cell.
The Slippery Random Maze project simulates a grid-based maze environment where an agent attempts to navigate from a starting position to a goal while encountering stochastic dynamics. The agent can move up, down, left, right, or stay in place. The movement action succeeds with an 85% probability if the target cell is not blocked; otherwise, the remaining 15% is equally distributed among other available adjacent cells.
In case "Up" and other movements are valid moves:
In case only "Down" and "Up" are valid moves:
Python code for state transition:
intended_move = self.transitions[action]
next_state = (state[0] + intended_move[0], state[1] + intended_move[1])
if not self._is_valid(next_state):
free_neighbors = self._get_free_neighbors(state)
if free_neighbors:
for free_state in free_neighbors:
reward = 0 if state == self.goal else -1
transitions.append([1 / len(free_neighbors), free_state, reward, False])
return transitions
else:
next_state = state
reward = 0 if next_state == self.goal else -1
transitions.append([1, next_state, reward, False])
return transitions
free_neighbors = self._get_free_neighbors(state)
for free_state in free_neighbors:
reward = 0 if free_state == self.goal else -1
prob = 0.15 / len(free_neighbors) if free_state != next_state else 0.15 / len(free_neighbors) + 0.85
transitions.append([prob, free_state, reward, False])
return transitionsThe agent receives a -1 reward for any action if it does not reach the goal cell as a result. The agent can't hit walls by design. When the goal is reached, the agent receives a reward of 0:
reward = 0 if self.state == self.goal else -1Three reinforcement learning algorithms are implemented in this environment: value iteration, policy iteration and q-learning. For first two approaches state transition function is used to calculate the probability of next state, for q-learning method next state is sampled from distribution described above.
Q-function update:
State value function update:
Clone the repository:
git clone https://github.com/dancher00/Slippery-Random-Maze.git
cd Slippery-Random-Maze
Use docker
docker build --no-cache -t randommaze .
./run.sh q_learning
./run.sh value_iteration
./run.sh policy_iteration
Use src/config.py to modify parameters of the training process.
Q-Learning Training
Q-Learning Results

Value Iteration Training
Value Iteration Results

Policy Iteration Training
Policy Iteration Results

Below is a demonstration of the project in action:
This project is licensed under the MIT License.
