Haoye Chai1, Shiyuan Zhang1, Xiaoqian Qi1, Baohua Qiu2, Yong Li1*
1 Tsinghua University 2 China Mobile
This is the official implementation of our foundation model for mobile traffic data, accepted by the KDD 2025 ADS track.
Our model adopts a three-stage paradigm consisting of tokenization
, pre-training
and fine-tuning
. The tokenization stage transforms the data into an T, H, W
representation. The pre-training stage learns the fundamental features of the data, while the fine-tuning stage incorporates the number of users and the distribution of POIs as conditional inputs.
We provide three datasets used in the original model training: TrafficNJ, TrafficSD, and TrafficNC, along with corresponding user count and POI data.
- The datasets are located in the
dataset64time
folder. - Please extract each
.rar
file to obtain the data in.json
format.
Each dataset is stored as a dictionary containing the following four keys:
-
train
: shapeN₁ × 1 × 64 × 4 × 4
64
: temporal length4 × 4
: spatial patches of the geographical area
-
test
: shapeN₂ × 1 × 64 × 4 × 4
64
: temporal length4 × 4
: spatial patches of the geographical area
-
val
: shapeN₃ × 1 × 64 × 4 × 4
64
: temporal length4 × 4
: spatial patches of the geographical area
-
timestamp
: shapeN × 1 × 64 × 2
64
: temporal length2
: timestamp corresponding to each time step, formatted as[time, day]
-
TrafficNJ
- 15-minute granularity
- timestamp dimension space:
[96, 7]
-
TrafficNC
- 30-minute granularity
- timestamp dimension space:
[48, 7]
-
TrafficSD
- 1-hour granularity
- timestamp dimension:
[24, 7]
- Python >= 3.7
- PyTorch >= 2.0.0
- CUDA >= 11.7
Use the following command to install all required Python modules and packages:
pip install -r requirements.txt
-
Extract the
.rar
files in thedataset64time
folder. -
Run the model using one of the following options:
-
Run pre-training and fine-tuning together:
python run.py
-
Run pre-training and fine-tuning separately:
-
Pre-training:
python main.py
-
Fine-tuning:
python main_alignment.py
-
-
-
The pre-trained models and generated data are saved in the following directory:
experiments/Len64_{Dataset_name}_Pretrain
-
The fine-tuned models and generated data are saved in the following directory:
experiments/Len64_data_{Dataset_name}_Finetuning
In the original paper, we conducted experiments by mixing multiple city-level datasets as input to UoMo. The resulting performance is shown as follows.