StockMARL: A Multi-Agent Reinforcement Learning System for Financial Market Simulation

Author: Peiyan Zou
Supervisor: Dr Peer-Olaf Siebers
Institution: University of Nottingham
Date: April 2025

Project Overview

StockMARL is the system enables a Deep Q-Network (DQN)-controlled RL agent to learn trading strategies by observing the behaviors and performance of various rule-based reactive agents, rather than relying solely on historical price data.

The implementation leverages:

AgentPy for agent-based simulation,
Gymnasium for RL environment wrapping,
Stable-Baselines3 for training the DQN model.

Prerequisites

Python 3.8+
pip

System Features

Diverse Reactive Agents: Includes Random Buyers, Day Traders, Momentum Traders, Risk/Risk-Averse Traders, and Herding Traders.
RL Agent: Learns from peer behaviours (buy/sell/hold decisions and performance metrics).
Behaviour-Driven Observation Space: Captures agent actions, profitability, win rates, and market sentiment.
Multi-Asset Support: Supports simulation and trading over multiple real-world stocks (e.g., AAPL, META, VISA, XOM).
Reward Engineering: Custom reward function aligned with trading performance, risk control, and behavioural realism.

How It Works

Agent Initialisation: All agents are instantiated with stochastic seeds to ensure behavioural diversity and reproducibility.
Training Loop:
- Each episode represents a simulated trading lifecycle (252 trading days).
- The RL agent receives an observation vector and outputs a single integer action (decoded to multi-stock decisions).
- The environment executes all trades, updates financial states, and returns rewards.
Reward Calculation:
- 70% weight on Net Worth and Portfolio Growth.
- 30% on Money-Weighted Rate of Return (MWRR).
- Penalties for invalid trading and excessive holding.

Evaluation Highlights

RL agent outperforms most reactive agents in generalisation test.
Achieved 12.23% Yearly MWRR and Profitability Score: 4.85 ± 1.63 with low volatility.
Final configuration tested across 700×252 trading days in simulation.

Run the Simulation-Training

Train the RL agent

python main.py

Configure agent population in main.py

agent_counts = {
    'RandomBuyerAgent': 5,
    'DayTraderAgent': 7,
    'MomentumTraderAgent': 6,
    'RiskTraderAgent': 6,
    'RiskAverseTraderAgent': 7,
    'HerdingTraderAgent': 3,
    'ReinforcementAgent': 1,
}

View log files
All trade histories will be saved under:

simulation/Trade_History/trade_history_epX.csv

Citation

If this project helps your research or education, please cite:

N/A

Contact

Peiyan Zou - [email protected]

Dr Peer-Olaf Siebers - [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.idea		.idea
resources/datasets		resources/datasets
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

StockMARL: A Multi-Agent Reinforcement Learning System for Financial Market Simulation

Project Overview

Prerequisites

Table of Contents

System Features

How It Works

Evaluation Highlights

Run the Simulation-Training

Citation

Contact

About

Uh oh!

Releases

Packages

Languages

peiyan03/Stock-MARL

Folders and files

Latest commit

History

Repository files navigation

StockMARL: A Multi-Agent Reinforcement Learning System for Financial Market Simulation

Project Overview

Prerequisites

Table of Contents

System Features

How It Works

Evaluation Highlights

Run the Simulation-Training

Citation

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages