MATE ROV 2025 Computer Vision Challenge

CWRUbotix's solution to the 2025 MATE ROV / Ocean Exploration Video Challenge.

Introduction & Methods

In response to MATE II's 2025 Ocean Exploration Video Challenge, CWRUbotix conducted a review of object detection models. Many fish detection and tracking models already exist in the literature¹. Most current models target more controlled operating environments to obtain higher tracking accuracy²³⁴ or use density-based⁵ count estimation, and therefore do not satisfy MATE II's requirements. We identified YOLO-Fish⁶ (which fine-tunes DarkNet's YOLOv3 implementation for fish detection using the DeepFish and OzFish datasets) as one good candidate model for detection in MATE II's unconstrained operating environment.

We qualitatively evaluated the performance of the trained YOLO-Fish network on the provided video sample using DarkNet and DarkHelp, and determined that although it had too many false negatives for submission, its output would be a good stepping stone in creating a labeled dataset from the video. We used the output of YOLO-Fish and the Roboflow labeling platform to create the dataset. Our code for running YOLO-Fish is available here (note: this is not our final model!).

Figure I: Example frame processed by YOLO-Fish, missing several large fish

Considering the decent performance of YOLO-Fish and our team's past experience with YOLO networks, we chose to fine tune YOLOv8 and YOLOv11 nano models on our labeled dataset. We also trained an EfficientDet model (available here), a network often compared with YOLO networks. We partitioned our dataset with a 60%/20%/20% training/validation/testing split and resized the video frames to 640x640. We trained the YOLO networks for 50 epochs with a batch size of 16 on an NVIDIA 4070 graphics card. As per Table I, we achieved a mean average precision of roughly 0.8 for both YOLO models at an IoU of 0.5, dropping to roughly 0.5 at higher IoUs. Our YOLO models therefore performed far better than YOLO-Fish (at 0.43 mAP 50 and 0.3 mAP 50-95 measured on our test split). EffDet drastically underperformed in testing.

Model	mAP 50	mAP 50-95
YOLO v8n	0.8187	0.4936
YOLO v11n	0.8279	0.4979
EffDet	0.0905	0.0358
YOLO-Fish	0.4347	0.2950

Table I: Mean average precision values for our models.

Figure II: Metrics during YOLOv8 training

Figure III: Metrics during YOLOv8 training

As per Table II, inference time for the YOLO models is close to 10ms/frame, with YOLO v8 performing slightly better.

Model	Inference Time (1 frame)	Full frame inference & write time
YOLO v8n	11.2ms	92.6ms
YOLO v11n	8.5ms	90.3ms

Table II: Inference times for YOLO models

Results

Figure IV: Example frame processed by our fine-tuned YOLO v11 model, performing much better on large fish

We chose to submit our YOLO v11 model to the competition. Figure V graphs the fish predictions every 5 seconds from this model on the provided video. Table III describes this data in detail. The contents of Table III are also available as a CSV, PDF, and XLSX in readme/data. Our annotated video is available on Google Drive.

Figure V: Graph of fish counts over time

Time (seconds @ 30 FPS)	Fish Count
0	4
5	0
10	17
15	26
20	40
25	37
30	42
35	62
40	10
45	23
50	50
55	58
60	53
65	78
70	41
75	71
80	35
85	49
90	42
95	27
100	48
105	38
110	27
115	14
120	28
125	58
130	44
135	31
140	18
145	21
150	5

Table III: Fish counts for example video in 5 second intervals

Future Work

We attempted fish tracking (not just detection) using YOLO-Fish (the DarkNet model) and SORT⁷ baseline tracking. Even an algorithm as simple as SORT was sufficient to have a visible "smoothing" affect, removing many 1-2 frame false positives. With some more time it would be interesting to apply SORT to our YOLOv11 output, or to experiment with Ultralytics' tracking method.

Applying an underwater dehazing algorithm⁸ to the video input before training might also make it easier for the network to learn features. Something as simple as the Dark Channel Prior or CLAHE might be enough to improve the clarity of distant fish.

Finally, training on a wider dataset of black sea bass videos would undoubtedly improve of the usefulness of our model. We were only able to find fish classification datasets and one other (unlabeled) NOAA video to supplement the provided video.

Usage

Setup

Install CUDA 12.1. Note that this version of CUDA is outdated. Our code may work on up-to-date CUDA, but it was only tested on 12.1.
Install the Python dependencies (Torch and Ultralytics) by running

pip install --user torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
pip install --user ultralytics

Predicting

To predict on a video, run the following command:

python predict.py -v path/to/your/video.mp4

i.e. if you move your video into our repo directory:

python predict.py -v video.mp4

Videos should be 30 FPS for accurate graphing and 1920x1080 to avoid distortion in the annotated video. The model's outputs (annotated video, CSV of fish counts, and graph of fish counts) will appear in the output/ directory.

You may also select different model weights:

python predict.py -v video.mp4 -m weights/yolov8-finetuned.pt

Training

To train a YOLOv11 model on the data in dataset/, run:

python train.py

Replace the dataset/images/ and dataset/labels/ directories with your own data to train for a different application. Edit the train.py file to use a different YOLO version.

Cui, M., Liu, X., Liu, H., Zhao, J., Li, D. and Wang, W. (2025), Fish Tracking, Counting, and Behaviour Analysis in Digital Aquaculture: A Comprehensive Survey. Rev Aquac, 17: e13001. https://doi.org/10.1111/raq.13001 ↩
P. L. F. Albuquerque, V. Garcia, A. d. S. O. Junior, et al., "Automatic Live Fingerlings Counting Using Computer Vision," Computers and Electronics in Agriculture 167 (2019): 105015. ↩
G. Xu, Q. Chen, T. Yoshida, et al., "Detection of Bluefin Tuna by Cascade Classifier and Deep Learning for Monitoring Fish Resources," in Global Oceans 2020 (Singapore–US Gulf Coast: IEEE, 2020), 1–4. ↩
S. M. D. Lainez and D. B. Gonzales, "Automated Fingerlings Counting Using Convolutional Neural Network," in 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS) (Singapore: IEEE, 2019), 67–72. ↩
N. Liu, Y. Long, C. Zou, Q. Niu, L. Pan, and H. Wu, "Adcrowdnet: An Attention-Injective Deformable Convolutional Network for Crowd Understanding," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA (2019), 3225–3234. ↩
A. A. Muksit, F. Hasan, Md. F. Hasan Bhuiyan Emon, M. R. Haque, A. R. Anwary, and S. Shatabda, "YOLO-Fish: A robust fish detection model to detect fish in realistic underwater environment," Ecological Informatics, vol. 72, p. 101847, 2022, doi: https://doi.org/10.1016/j.ecoinf.2022.101847. ↩
A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, "Simple online and realtime tracking," Sep. 2016. doi: 10.1109/icip.2016.7533003. ↩
X. Shuang, J. Zhang, and Y. Tian, "Algorithms for improving the quality of underwater optical images: A comprehensive review," Signal Processing, vol. 219, p. 109408, 2024, doi: https://doi.org/10.1016/j.sigpro.2024.109408. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
dataset		dataset
output		output
readme		readme
setup		setup
sort		sort
utils		utils
weights		weights
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
effdet_predict.py		effdet_predict.py
predict.py		predict.py
requirements_utils.txt		requirements_utils.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MATE ROV 2025 Computer Vision Challenge

Introduction & Methods

Results

Future Work

Usage

Setup

Predicting

Training

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Time (seconds @ 30 FPS)	Fish Count
0	4
5	0
10	17
15	26
20	40
25	37
30	42
35	62
40	10
45	23
50	50
55	58
60	53
65	78
70	41
75	71
80	35
85	49
90	42
95	27
100	48
105	38
110	27
115	14
120	28
125	58
130	44
135	31
140	18
145	21
150	5

Time (seconds @ 30 FPS)	Fish Count
0	4
5	0
10	17
15	26
20	40
25	37
30	42
35	62
40	10
45	23
50	50
55	58
60	53
65	78
70	41
75	71
80	35
85	49
90	42
95	27
100	48
105	38
110	27
115	14
120	28
125	58
130	44
135	31
140	18
145	21
150	5

License

CWRUbotix/rov25-ml-challenge

Folders and files

Latest commit

History

Repository files navigation

MATE ROV 2025 Computer Vision Challenge

Introduction & Methods

Results

Future Work

Usage

Setup

Predicting

Training

Footnotes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages

Time (seconds @ 30 FPS)	Fish Count
0	4
5	0
10	17
15	26
20	40
25	37
30	42
35	62
40	10
45	23
50	50
55	58
60	53
65	78
70	41
75	71
80	35
85	49
90	42
95	27
100	48
105	38
110	27
115	14
120	28
125	58
130	44
135	31
140	18
145	21
150	5