A reinforcement learning project built around a custom 2D top-down racing simulator written in Python.
The environment uses LIDAR-style raycasts for perception and trains a Deep Q-Network (DQN) agent using PyTorch.
Training runs headless, while trained models can be replayed visually using Pygame.
Note: This project was originally written in C++ (using Raylib + libtorch) and has been fully ported to Python for ease of setup and experimentation.
The project consists of the following scripts:
-
Training (
python/train.py)
Headless reinforcement learning using a Double DQN agent. -
Fine-Tuning (
python/fine_tune.py)
Loads a pre-trained model and continues training with conservative hyperparameters (lower learning rate, lower epsilon). -
Replay (
python/replay.py)
Loads a trained model and visualizes behavior in real time with Pygame.
The environment is fully custom, including physics, collision handling, checkpoint logic, and reward shaping.
- 2D pixel-based racing track
- Custom physics (speed, friction/drag, steering, wall/grass collisions)
- 8 checkpoints + 3-lap race system
- Deterministic step-based simulation
Tracks and assets live in assets/.
The agent observes a 23-dimensional state vector:
- Normalized speed
sin(heading),cos(heading)- Normalized position
(x, y) - 13 short-range LIDAR raycasts (danger sensing, −90° to +90°)
- 5 long-range LIDAR raycasts (anticipation, forward-facing)
Raycasts return a normalized danger value:
danger = 1 / ((distance / reference_distance) + 0.1)
Values are clamped to [0, 1].
Discrete action space with 7 actions:
- Accelerate forward
- Reverse
- Steer left
- Steer right
- Forward + left
- Forward + right
- No input
The reward function is shaped using:
- Progress toward next checkpoint
- Small speed incentive (scaled conservatively)
- Checkpoint reward
- Lap completion reward
- Race finish reward
- Time penalty
- Wall collision penalty
- Grass penalty
- Anti-idle penalty
Episodes may terminate early if the vehicle becomes stuck or stops making progress.
Observed training characteristics:
- Early models (~100 episodes): learns basic movement and wall avoidance
- Mid training (~300-500 episodes): starts completing laps, still hits walls on corners
- Fine-tuned models (~560 + 50 fine-tune episodes): stable multi-lap behavior with minimal wall hits
.
├── assets/ # Track images and car texture
│ └── raceTrackFullyWalled.png
├── python/
│ ├── environment.py # Racing environment (physics, LIDAR, checkpoints, rewards)
│ ├── dqn.py # DQN network, agent, and replay buffer
│ ├── train.py # Headless training script
│ ├── fine_tune.py # Fine-tuning script for trained models
│ ├── replay.py # Visual Pygame replay
│ ├── requirements.txt # Python dependencies
│ └── models/ # Saved model checkpoints
├── LICENSE
└── README.md
- Python 3.12+
- PyTorch
- Pygame
- Pillow
- NumPy
cd python
pip install torch pygame Pillow numpyTo train a new agent from scratch:
cd python
python train.pyTraining runs headless and saves model checkpoints every 50 episodes to python/models/. Press Ctrl+C to save the final model and exit gracefully.
To improve an existing model with conservative hyperparameters:
cd python
python fine_tune.py models/model_episode_560.ptFine-tuning uses a lower learning rate (1e-4) and lower epsilon (0.1) to refine learned behavior without losing stability.
To visualize a trained model:
cd python
python replay.py models/ft_episode_50.pt| Key | Action |
|---|---|
| SPACE | Restart race |
| L | Toggle LIDAR visualization |
| ESC | Exit |
The replay shows the track, car, LIDAR rays (orange = short-range, blue = long-range), checkpoints, and a HUD with speed/lap/time info.