This repo contains baseline code for
- Multi-Object Tracking (MOT): Detecting (Yolov5) and tracking (DeepSORT, ByteTrack) objects in video streams.
- Determining object attributes: (like color, type in vehicles, or speed estimation if camera calibration is performed).
- Multi-target multi-camera tracking (MTMC): Match tracks across cameras after running MOT in a multi-camera system.
- Evaluation: Calculate MOT/MTMC metrics (MOTA, IDF1) automatically if ground truth annotations are provided.
- Express run: Run everything from above in one fly.
- Nvidia drivers have to be installed (check with
nvidia-smi
), preferably supporting CUDA >11.0. - Tested on python3.7 to 3.10.
- The
requirements.txt
contains a working configuration with torch1.13.0
, but installing the packages manually with different versions can work too (older python versions may require older torch versions) - A working c++ compiler toolchain,
python-devel
headers andwheel
are required for installing torch, numpy, and scipy withpip
. Otherwise the installation will only work with conda.
Creating a virtual environment is highly recommended, except if working in a disposable environment (Kaggle, Colab, etc).
Clone the repo including the submodules:
git clone --recurse-submodules [email protected]:regob/vehicle_mtmc.git
Before installing requirements.txt
cython needs to be installed:
pip install cython "numpy>=1.18.5,<1.23.0"
then install the rest:
pip install -r requirements.txt
Some pretrained models can be downloaded from Google drive. Create a models
subdirectory, and unzip the models there. It contains:
- A resnet50-ibn re-id model trained on VeRi-Wild, CityFlow, VRIC, and some private data.
- SVM classifiers for vehicle color/type running on the re-id embeddings.
Running single-cam tracking requires at a minimum a video, a re-id model and a configuration file. A fairly minimal configuration file for the highway.mp4
example video and pretrained re-id model is below:
OUTPUT_DIR: "output/mot_highway"
MOT:
VIDEO: "datasets/highway.mp4"
REID_MODEL_OPTS: "models/resnet50_mixstyle/opts.yaml"
REID_MODEL_CKPT: "models/resnet50_mixstyle/net_19.pth"
DETECTOR: "yolov5x6"
TRACKER: "bytetrack_iou"
SHOW: false
VIDEO_OUTPUT: true
Any car traffic video should be fine for testing, the video from the screenshots can be downloaded as:
$ yt-dlp -f mp4 -o datasets/highway.mp4 https://www.youtube.com/watch?v=nt3D26lrkho
Install
yt-dlp
oryoutube-dl
for downloading youtube videos (the former bypasses rate limits).
The example configuration is at config/examples/mot_highway.yaml
. Tracking can be run from the repo root with (PYTHONPATH
needs to be set to the root folder):
$ export PYTHONPATH=$(pwd)
$ python3 mot/run_tracker.py --config examples/mot_highway.yaml
The required parameters for MOT are (paths can be relative to the repo root, or absolute):
OUTPUT_DIR
: Directory, where the outputs will be saved.MOT.VIDEO
: Path to the video input.MOT.REID_MODEL_OPTS
: path to theopts.yaml
of the reid model.MOT.REID_MODEL_CKPT
: path to the checkpoint of the reid model.
Other important parameters:
MOT.DETECTOR
: yolov5 versions are supported.MOT.TRACKER
: Choose between ByteTrack ("bytetrack_iou") or DeepSORT ("deepsort").MOT.SHOW
: Show tracking online in a window (cv2 needs to connect to display for this, or it crashes).MOT.VIDEO_OUTPUT
: Save tracked video in the output folder.MOT.STATIC_ATTRIBUTES
: Configure attribute extraction models.MOT.CALIBRATION
: Camera calibration file (to be described below).
Determining static attributes (e.g. type, color) can be configured as:
MOT:
STATIC_ATTRIBUTES:
- color: "models/color_svm.pkl"
- type: "models/type_svm.pkl"
Models can be the following:
- pytorch CNN, that gets the image in the bounding box as input
- pytorch fully-connected NN that predicts the attribute from the re-id embedding.
- sklearn/xgboost/etc models that are pickled, and have a
predict(x)
method that predicts from the re-id embedding as a numpy array.
When adding a new attribute besides color and type, its possible values have to be configured in mot/attributes.py.
Camera calibration has to be performed with the Cal_PNP package to get a homography matrix, then the path to the homography matrix has to be configured in MOT.CALIBRATION
. An example homography matrix file is provided for highway.mp4
at config/examples/highway_calibration.txt.
Express Multi-camera tracking runs MOT on all cameras and then hierarchical clustering on single-camera tracks. Temporal constraints are also considered, and have to be pre-configured in the MTMC.CAMERA_LAYOUT
parameter. An example config for CityFlow S02 (4 cameras at a crossroad) is at config/cityflow/express_s02.yaml. Its part describing the MTMC config is:
MTMC:
CAMERA_LAYOUT: 'config/cityflow/s02_camera_layout.txt'
LINKAGE: 'average'
MIN_SIM: 0.5
EXPRESS:
FINAL_VIDEO_OUTPUT: true
CAMERAS:
- "video": "datasets/cityflow_track3/validation/S02/c006/vdo.avi"
"detection_mask": "assets/cityflow/c006_mask.jpg"
"calibration": "datasets/cityflow_track3/validation/S02/c006/calibration.txt"
- "video": "datasets/cityflow_track3/validation/S02/c007/vdo.avi"
"detection_mask": "assets/cityflow/c007_mask.jpg"
"calibration": "datasets/cityflow_track3/validation/S02/c007/calibration.txt"
- "video": "datasets/cityflow_track3/validation/S02/c008/vdo.avi"
"detection_mask": "assets/cityflow/c008_mask.jpg"
"calibration": "datasets/cityflow_track3/validation/S02/c008/calibration.txt"
- "video": "datasets/cityflow_track3/validation/S02/c009/vdo.avi"
"detection_mask": "assets/cityflow/c009_mask.jpg"
"calibration": "datasets/cityflow_track3/validation/S02/c009/calibration.txt"
The MOT config is the same for all cameras, but for each camera, at least the "video" key has to be given in EXPRESS.CAMERAS
, the meaning of the keys is the same as in the MOT config.
In the MTMC config there are only a few paramteres:
MTMC.LINKAGE
chooses the linkage for hierarchical clustering from ['single', 'complete', 'average'].MTMC.MIN_SIM
is the minimal similarity between multi-cam tracks above which they can be merged.MTMC.CAMERA_LAYOUT
stores the mandatory camera constraints file. The camera layout file for CityFlow S02 is at config/cityflow/s02_camera_layout.txt. On Cityflow S02 express MTMC can be run as:
$ export PYTHONPATH=$(pwd)
$ python3 mtmc/run_express_mtmc.py --config cityflow/express_s02.yaml
For running the example config, the S02 scenario of the Cityflow dataset is needed to be unzipped to
datasets/cityflow_track3/validation
.
Models trained by my reid/vehicle_reid repo are supported out-of-the-box in the configuration. Other torch models could be integrated by modifying the model loading in mot/run_tracker.py
, which currently looks like this:
# initialize reid model
reid_model = load_model_from_opts(cfg.MOT.REID_MODEL_OPTS,
ckpt=cfg.MOT.REID_MODEL_CKPT,
remove_classifier=True)
If you reuse this work, please consider citing our paper:
Szűcs, G., Borsodi, R., Papp, D. (2023). Multi-Camera Trajectory Matching based on Hierarchical Clustering and Constraints. Multimedia Tools and Applications, https://doi.org/10.1007/s11042-023-17397-0
Some parts are adapted from other repositories:
- nwojke/deep_sort: Original
DeepSORT code.
- theAIGuysCode/yolov4-deepsort:
Enhanced version of DeepSORT.
- ifzhang/ByteTrack: Original
ByteTrack tracker code.
The yolov5 and vehicle_reid repos are used as submodules.