[Detector Support]: Onnx_0 is very slow (176049784148.85 ms) #20511

mateuszdrab · 2025-10-15T19:53:59Z

mateuszdrab
Oct 15, 2025

Describe the problem you are having

Since switching from TensorRT to Onnx runtime as part of the 0.15 to 0.16 upgrade, my Frigate instances have been randomly experiencing periods of time where inferencing is said to be slower than the usual 60-100ms and the values are usually in some very high ranges such as 176049784148ms as in the title.

The environment is quite unique, and the issue may or may not actually be related to Frigate - but, I have to start somewhere.

The 3 Frigate instances named test, a and s are all running in Docker on an Ubuntu 24.04.3 (kernel 6.11.0-1012-azure)
VM which is running on a Hyper-V host. The GPU is passed through using paravirtualization (the same method as WSL uses). Due to a combination of old hardware (Nvidia Tesla P40), driver limitations (WDDM mode not working on newer drivers) I'm stuck with an older version of the Nvidia driver (539.19 GRID) with CUDA 12.2 only on this Server 2025 host.

Considering there's no going back to TensorRT - I'm afraid 'automating the restart' might be the only solution.

Below is a graph showing the high inferencing time occurrences across my three instances in the past two months.

Any advice at where to even start investigating/debugging will be highly appreciated.

Version

0.16.1-e664cb2

Frigate config file

version: 0.16-0

mqtt:
  enabled: true
  host: redacted
  topic_prefix: redacted
  client_id: redacted
  user: '{FRIGATE_MQTT_USER}'
  password: '{FRIGATE_MQTT_PASSWORD}'

detect:
  enabled: true

detectors:
  onnx_0:
    type: onnx

model:
  model_type: yolonas
  width: 320 # <--- should match whatever was set in notebook
  height: 320 # <--- should match whatever was set in notebook
  input_pixel_format: bgr
  input_tensor: nchw
  path: /shared-models/yolo_nas_s_320.onnx
  labelmap_path: /labelmap/coco-80.txt

semantic_search:
  enabled: false
  model_size: small
face_recognition:
  enabled: false
  model_size: large
lpr:
  enabled: false
classification:
  bird:
    enabled: false

motion:
  improve_contrast: false

record:
  enabled: true
  retain:
    days: 30
    mode: all
  alerts:
    retain:
      days: 30
  detections:
    retain:
      days: 30

snapshots:
  enabled: true
  retain:
    default: 7

audio:
  listen:
    - fire_alarm
    - scream
    - speech
    - yell
    - slam
    - knock
    - beep

    - dog
    - bark
    - howl

objects:
  track:
    - person
    - dog

cameras:
  raven_front_yard:
    enabled: true
    audio:
      enabled: true
    ffmpeg:
      inputs:
          # stream coming from go2rtc from another location, but the same issue occurs when go2rtc runs locally
        - path: rtsp://redacted:8554/front_yard?video&audio
          input_args: preset-rtsp-restream
          roles:
            - record
        - path: rtsp://redacted3:8554/front_yard_sub?video&audio
          input_args: preset-rtsp-restream
          roles:
            - detect
            - audio
      output_args:
        record: preset-record-generic-audio-aac
    objects:
      filters:
        car:
          mask:
            - 0.557,1,0.553,0.482,0.825,0.466,1,0.496,1,1
            - 0,0.446,1,0.318,0.997,0.165,0.003,0.27
      track:
        - person
        - car
        - dog
        - cat
        - motorcycle
        - bird
        - bicycle
    motion:
      mask: 0.737,0,0.737,0.045,1,0.045,0.998,-0.002

      threshold: 35
      contour_area: 15
      improve_contrast: false
    zones:
      raven_front_yard_garden:
        coordinates: 0.161,0.754,0.147,1,0.996,1,1,0.876,0.773,0.632
        loitering_time: 0
        objects: person
        inertia: 3

docker-compose file or Docker CLI command

# To allow for sharing of an IP with other containers, we use a "pod" namespace holding container
# The other containers have been omitted for clarity
services:
  pod:
    image: k8s.gcr.io/pause:3.8
    restart: always
    networks:
      macvlan_net:
        ipv4_address: redacted
      frigate:
        ipv4_address: redacted
  frigate:
    privileged: true # this may not be necessary for all setups
    restart: unless-stopped
    image: ghcr.io/blakeblackshear/frigate:0.16.1-tensorrt
    shm_size: "128mb"
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /opt/frigate/test/config/:/config/
      - /opt/frigate/test/storage/:/media/frigate/
      - /opt/models/:/shared-models/:ro
      - type: tmpfs
        target: /tmp/cache
        tmpfs:
          size: "128mb"
    runtime: nvidia  
    deploy:
      resources:
        limits:
          memory: 6Gb
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    network_mode: "service:pod"
    depends_on:
      - pod
    environment:
      FRIGATE_MQTT_USER: redacted
      FRIGATE_MQTT_PASSWORD: redacted

Relevant Frigate log output

frigate.record.maintainer      WARNING : Too many unprocessed recording segments in cache for <camera name>. This likely indicates an issue with the detect stream, keeping the 6 most recent segments out of 7 and discarding the rest...

Install method

Docker Compose

Object Detector

Other

Screenshots of the Frigate UI's System metrics pages

Unfortunately I restarted the faulty instance but the issue will re-appear soon so will update this.
Metrics did not show anything concerning except the high inferencing time.

Any other information that may be helpful

There is no specific errors on the log, just detection drop messages. Cameras continue to work but many of them randomly drop out saying ''No frames have been received, check error logs" momentarily and then reappear.

2025-10-15T19:54:36Z

inkeep-ai-assistant[bot]
bot Oct 15, 2025

Based on your setup with an Nvidia Tesla P40 GPU and the extremely high inference times you're experiencing with ONNX, there are several factors that could be contributing to this issue.

For Nvidia GPU setups, Frigate supports ONNX detection where TensorRT will automatically be detected and used as a detector in the -tensorrt Frigate image when a supported ONNX model is configured (1). However, your environment has some unique constraints that may be affecting performance.

The Tesla P40 requires specific CUDA compatibility. TensorRT requires minimum driver version >=545 and the GPU must support a Compute Capability of 5.0 or greater (2). Your older driver version (539.19 GRID) with CUDA 12.2 may be causing compatibility issues, especially since you mentioned you're stuck with this older version due to hardware limitations.

Looking at typical inference speeds for Nvidia GPUs, even older cards should perform much better than what you're experiencing. For reference, a GTX 1070 achieves around 16ms for YOLOv9 s-320 models (2), so your Tesla P40 should be performing significantly better than the 176+ second inference times you're seeing.

A few things to check:

CUDA Library Issues: The extremely high inference times suggest the ONNX runtime may be falling back to CPU processing instead of using GPU acceleration. This could happen if CUDA libraries aren't properly accessible to the container.
Memory Constraints: Your shm_size is set to only 128mb, which might be insufficient. Consider increasing this if you're experiencing memory-related bottlenecks.
Multiple Detector Instances: You might benefit from configuring multiple ONNX detectors to distribute the load (1):

detectors:
  onnx_0:
    type: onnx
  onnx_1:
    type: onnx

(1)

Model Optimization: Consider testing with different model types. For Nvidia GPUs, YOLOv9 models are recommended and may perform better than YOLO-NAS on your hardware (3).

Given your unique virtualized GPU setup with older drivers, the issue may indeed be related to CUDA library compatibility or GPU passthrough limitations rather than Frigate itself. The fact that this started after switching from TensorRT to ONNX suggests the ONNX runtime isn't properly utilizing your GPU acceleration.

📚 Sources:

Have another question? Just tag @inkeep.

0 replies

NickM-27 · 2025-10-15T20:03:31Z

NickM-27
Oct 15, 2025
Collaborator Sponsor

Yes, this is a known issue due to the way the ONNX executions work in 0.16, as CPU activity can greatly increase the inference time. This has been significantly improved in 0.17 to where performance is much better overall and CPU does not affect it in this way.

4 replies

torsteinelv Oct 26, 2025

Is v17 the dev version now?

NickM-27 Oct 26, 2025
Collaborator Sponsor

yes

torsteinelv Nov 3, 2025

Have the tensorrt building been removed from newer versions of frigate? In order to continue to use tensorrt we need to build the tensorrt model and then move it to frigate?

NickM-27 Nov 3, 2025
Collaborator Sponsor

There is no reason to use TensorRT, in 0.17 performance with ONNX using Cuda has been greatly improved.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Detector Support]: Onnx_0 is very slow (176049784148.85 ms) #20511

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

[Detector Support]: Onnx_0 is very slow (176049784148.85 ms) #20511

Uh oh!

mateuszdrab Oct 15, 2025

Describe the problem you are having

Version

Frigate config file

docker-compose file or Docker CLI command

Relevant Frigate log output

Install method

Object Detector

Screenshots of the Frigate UI's System metrics pages

Any other information that may be helpful

Replies: 2 comments · 4 replies

Uh oh!

inkeep-ai-assistant[bot] bot Oct 15, 2025

Uh oh!

NickM-27 Oct 15, 2025 Collaborator Sponsor

Uh oh!

torsteinelv Oct 26, 2025

Uh oh!

NickM-27 Oct 26, 2025 Collaborator Sponsor

Uh oh!

torsteinelv Nov 3, 2025

Uh oh!

NickM-27 Nov 3, 2025 Collaborator Sponsor

mateuszdrab
Oct 15, 2025

Replies: 2 comments 4 replies

inkeep-ai-assistant[bot]
bot Oct 15, 2025

NickM-27
Oct 15, 2025
Collaborator Sponsor

NickM-27 Oct 26, 2025
Collaborator Sponsor

NickM-27 Nov 3, 2025
Collaborator Sponsor