[Detector Support]: Onnx_0 is very slow (176049784148.85 ms) #20511
Replies: 2 comments 4 replies
-
|
Based on your setup with an Nvidia Tesla P40 GPU and the extremely high inference times you're experiencing with ONNX, there are several factors that could be contributing to this issue. For Nvidia GPU setups, Frigate supports ONNX detection where TensorRT will automatically be detected and used as a detector in the The Tesla P40 requires specific CUDA compatibility. TensorRT requires minimum driver version Looking at typical inference speeds for Nvidia GPUs, even older cards should perform much better than what you're experiencing. For reference, a GTX 1070 achieves around 16ms for YOLOv9 s-320 models (2), so your Tesla P40 should be performing significantly better than the 176+ second inference times you're seeing. A few things to check:
detectors:
onnx_0:
type: onnx
onnx_1:
type: onnx
Given your unique virtualized GPU setup with older drivers, the issue may indeed be related to CUDA library compatibility or GPU passthrough limitations rather than Frigate itself. The fact that this started after switching from TensorRT to ONNX suggests the ONNX runtime isn't properly utilizing your GPU acceleration. 📚 Sources: Have another question? Just tag @inkeep. |
Beta Was this translation helpful? Give feedback.
-
|
Yes, this is a known issue due to the way the ONNX executions work in 0.16, as CPU activity can greatly increase the inference time. This has been significantly improved in 0.17 to where performance is much better overall and CPU does not affect it in this way. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Describe the problem you are having
Since switching from TensorRT to Onnx runtime as part of the 0.15 to 0.16 upgrade, my Frigate instances have been randomly experiencing periods of time where inferencing is said to be slower than the usual 60-100ms and the values are usually in some very high ranges such as
176049784148msas in the title.The environment is quite unique, and the issue may or may not actually be related to Frigate - but, I have to start somewhere.
The 3 Frigate instances named
test,aandsare all running in Docker on an Ubuntu 24.04.3 (kernel 6.11.0-1012-azure)VM which is running on a Hyper-V host. The GPU is passed through using paravirtualization (the same method as WSL uses). Due to a combination of old hardware (Nvidia Tesla P40), driver limitations (WDDM mode not working on newer drivers) I'm stuck with an older version of the Nvidia driver (539.19 GRID) with CUDA 12.2 only on this Server 2025 host.
Considering there's no going back to TensorRT - I'm afraid 'automating the restart' might be the only solution.
Below is a graph showing the high inferencing time occurrences across my three instances in the past two months.
Any advice at where to even start investigating/debugging will be highly appreciated.
Version
0.16.1-e664cb2
Frigate config file
docker-compose file or Docker CLI command
Relevant Frigate log output
Install method
Docker Compose
Object Detector
Other
Screenshots of the Frigate UI's System metrics pages
Unfortunately I restarted the faulty instance but the issue will re-appear soon so will update this.
Metrics did not show anything concerning except the high inferencing time.
Any other information that may be helpful
There is no specific errors on the log, just detection drop messages. Cameras continue to work but many of them randomly drop out saying ''No frames have been received, check error logs" momentarily and then reappear.
Beta Was this translation helpful? Give feedback.
All reactions