Skip to content

CWRUbotix/rov25-ml-challenge

Repository files navigation

MATE ROV 2025 Computer Vision Challenge

CWRUbotix's solution to the 2025 MATE ROV / Ocean Exploration Video Challenge.

Introduction & Methods

In response to MATE II's 2025 Ocean Exploration Video Challenge, CWRUbotix conducted a review of object detection models. Many fish detection and tracking models already exist in the literature1. Most current models target more controlled operating environments to obtain higher tracking accuracy234 or use density-based5 count estimation, and therefore do not satisfy MATE II's requirements. We identified YOLO-Fish6 (which fine-tunes DarkNet's YOLOv3 implementation for fish detection using the DeepFish and OzFish datasets) as one good candidate model for detection in MATE II's unconstrained operating environment.

We qualitatively evaluated the performance of the trained YOLO-Fish network on the provided video sample using DarkNet and DarkHelp, and determined that although it had too many false negatives for submission, its output would be a good stepping stone in creating a labeled dataset from the video. We used the output of YOLO-Fish and the Roboflow labeling platform to create the dataset. Our code for running YOLO-Fish is available here (note: this is not our final model!).

Example YOLO-Fish frame Figure I: Example frame processed by YOLO-Fish, missing several large fish

Considering the decent performance of YOLO-Fish and our team's past experience with YOLO networks, we chose to fine tune YOLOv8 and YOLOv11 nano models on our labeled dataset. We also trained an EfficientDet model (available here), a network often compared with YOLO networks. We partitioned our dataset with a 60%/20%/20% training/validation/testing split and resized the video frames to 640x640. We trained the YOLO networks for 50 epochs with a batch size of 16 on an NVIDIA 4070 graphics card. As per Table I, we achieved a mean average precision of roughly 0.8 for both YOLO models at an IoU of 0.5, dropping to roughly 0.5 at higher IoUs. Our YOLO models therefore performed far better than YOLO-Fish (at 0.43 mAP 50 and 0.3 mAP 50-95 measured on our test split). EffDet drastically underperformed in testing.

Model mAP 50 mAP 50-95
YOLO v8n 0.8187 0.4936
YOLO v11n 0.8279 0.4979
EffDet 0.0905 0.0358
YOLO-Fish 0.4347 0.2950

Table I: Mean average precision values for our models.

YOLOv8 training graphs Figure II: Metrics during YOLOv8 training

YOLOv11 training graphs Figure III: Metrics during YOLOv8 training

As per Table II, inference time for the YOLO models is close to 10ms/frame, with YOLO v8 performing slightly better.

Model Inference Time (1 frame) Full frame inference & write time
YOLO v8n 11.2ms 92.6ms
YOLO v11n 8.5ms 90.3ms

Table II: Inference times for YOLO models

Results

Example YOLO v11 frame Figure IV: Example frame processed by our fine-tuned YOLO v11 model, performing much better on large fish

We chose to submit our YOLO v11 model to the competition. Figure V graphs the fish predictions every 5 seconds from this model on the provided video. Table III describes this data in detail. The contents of Table III are also available as a CSV, PDF, and XLSX in readme/data. Our annotated video is available on Google Drive.

Fish count graph Figure V: Graph of fish counts over time

Time (seconds @ 30 FPS) Fish Count
0 4
5 0
10 17
15 26
20 40
25 37
30 42
35 62
40 10
45 23
50 50
55 58
60 53
65 78
70 41
75 71
80 35
85 49
90 42
95 27
100 48
105 38
110 27
115 14
120 28
125 58
130 44
135 31
140 18
145 21
150 5

Table III: Fish counts for example video in 5 second intervals

Future Work

We attempted fish tracking (not just detection) using YOLO-Fish (the DarkNet model) and SORT7 baseline tracking. Even an algorithm as simple as SORT was sufficient to have a visible "smoothing" affect, removing many 1-2 frame false positives. With some more time it would be interesting to apply SORT to our YOLOv11 output, or to experiment with Ultralytics' tracking method.

Applying an underwater dehazing algorithm8 to the video input before training might also make it easier for the network to learn features. Something as simple as the Dark Channel Prior or CLAHE might be enough to improve the clarity of distant fish.

Finally, training on a wider dataset of black sea bass videos would undoubtedly improve of the usefulness of our model. We were only able to find fish classification datasets and one other (unlabeled) NOAA video to supplement the provided video.

Usage

Setup

  1. Install CUDA 12.1. Note that this version of CUDA is outdated. Our code may work on up-to-date CUDA, but it was only tested on 12.1.
  2. Install the Python dependencies (Torch and Ultralytics) by running
pip install --user torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
pip install --user ultralytics

Predicting

To predict on a video, run the following command:

python predict.py -v path/to/your/video.mp4

i.e. if you move your video into our repo directory:

python predict.py -v video.mp4

Videos should be 30 FPS for accurate graphing and 1920x1080 to avoid distortion in the annotated video. The model's outputs (annotated video, CSV of fish counts, and graph of fish counts) will appear in the output/ directory.

You may also select different model weights:

python predict.py -v video.mp4 -m weights/yolov8-finetuned.pt

Training

To train a YOLOv11 model on the data in dataset/, run:

python train.py

Replace the dataset/images/ and dataset/labels/ directories with your own data to train for a different application. Edit the train.py file to use a different YOLO version.

Footnotes

  1. Cui, M., Liu, X., Liu, H., Zhao, J., Li, D. and Wang, W. (2025), Fish Tracking, Counting, and Behaviour Analysis in Digital Aquaculture: A Comprehensive Survey. Rev Aquac, 17: e13001. https://doi.org/10.1111/raq.13001

  2. P. L. F. Albuquerque, V. Garcia, A. d. S. O. Junior, et al., "Automatic Live Fingerlings Counting Using Computer Vision," Computers and Electronics in Agriculture 167 (2019): 105015.

  3. G. Xu, Q. Chen, T. Yoshida, et al., "Detection of Bluefin Tuna by Cascade Classifier and Deep Learning for Monitoring Fish Resources," in Global Oceans 2020 (Singapore–US Gulf Coast: IEEE, 2020), 1–4.

  4. S. M. D. Lainez and D. B. Gonzales, "Automated Fingerlings Counting Using Convolutional Neural Network," in 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS) (Singapore: IEEE, 2019), 67–72.

  5. N. Liu, Y. Long, C. Zou, Q. Niu, L. Pan, and H. Wu, "Adcrowdnet: An Attention-Injective Deformable Convolutional Network for Crowd Understanding," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA (2019), 3225–3234.

  6. A. A. Muksit, F. Hasan, Md. F. Hasan Bhuiyan Emon, M. R. Haque, A. R. Anwary, and S. Shatabda, "YOLO-Fish: A robust fish detection model to detect fish in realistic underwater environment," Ecological Informatics, vol. 72, p. 101847, 2022, doi: https://doi.org/10.1016/j.ecoinf.2022.101847.

  7. A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, "Simple online and realtime tracking," Sep. 2016. doi: 10.1109/icip.2016.7533003.

  8. X. Shuang, J. Zhang, and Y. Tian, "Algorithms for improving the quality of underwater optical images: A comprehensive review," Signal Processing, vol. 219, p. 109408, 2024, doi: https://doi.org/10.1016/j.sigpro.2024.109408.

About

CWRUbotix's repository for the 2025 MATE ROV + NOAA Computer Coding Challenge

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •