Best way to showcase GPU_HW_MATMUL? #13806
Replies: 9 comments 4 replies
-
Also, somehow iGPU is more performant? Is that normal? The work is based on this notebook. FYI.
|
Beta Was this translation helpful? Give feedback.
-
Also, I checked, I got the resizeable bar working with the Intel ARC A380 (maybe it's not?). I'm using eGPU not sure if that also matters. |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
HW_MATMUL supports int8 and fp16. |
Beta Was this translation helpful? Give feedback.
-
Tried the yolov5... :\ Is this normal?
|
Beta Was this translation helpful? Give feedback.
-
Oohh I got 49fps now if I use this use_device_mem flag... Ok! I think it makes sense now.
|
Beta Was this translation helpful? Give feedback.
-
And lots better if I use batch size 16 and so... :D Oh dear I think it's working!
|
Beta Was this translation helpful? Give feedback.
-
vs my gen 12 CPU :) [Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[ INFO ] Input command: C:\Users\raymo\Documents\openvino\bin\intel64\Release\benchmark_app.exe -m .\yolo\yolov5m.xml -d CPU -t 30
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO: OpenVINO Runtime version ......... 2022.3.0
[ INFO ] Build ........... 2022.3.0-8523-87f61cf8227
[ INFO ]
[ INFO ] Device info:
[ INFO ] CPU
[ INFO ] openvino_intel_cpu_plugin version ......... 2022.3.0
[ INFO ] Build ........... 2022.3.0-8523-87f61cf8227
[ INFO ]
[ INFO ]
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(CPU) performance hint will be set to THROUGHPUT.
[Step 4/11] Reading network files
[ INFO ] Loading network files
[ INFO ] Read network took 45.42 ms
[ INFO ] Original network I/O parameters:
Network inputs:
images (node: images) : f32 / [...] / {1,3,640,640}
Network outputs:
output (node: output) : f32 / [...] / {1,25200,85}
462 (node: 462) : f32 / [...] / {1,3,80,80,85}
520 (node: 520) : f32 / [...] / {1,3,40,40,85}
578 (node: 578) : f32 / [...] / {1,3,20,20,85}
[Step 5/11] Resizing network to match image sizes and given batch
[ WARNING ] images: layout is not set explicitly, so it is defaulted to NCHW. It is STRONGLY recommended to set layout manually to avoid further issues.
[Step 6/11] Configuring input of the model
[ INFO ] Network batch size: 1
Network inputs:
images (node: images) : u8 / [N,C,H,W] / {1,3,640,640}
Network outputs:
output (node: output) : f32 / [...] / {1,25200,85}
462 (node: 462) : f32 / [...] / {1,3,80,80,85}
520 (node: 520) : f32 / [...] / {1,3,40,40,85}
578 (node: 578) : f32 / [...] / {1,3,20,20,85}
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 467.94 ms
[Step 8/11] Setting optimal runtime parameters
[ INFO ] Device: CPU
[ INFO ] { NETWORK_NAME , torch-jit-export }
[ INFO ] { OPTIMAL_NUMBER_OF_INFER_REQUESTS , 5 }
[ INFO ] { NUM_STREAMS , 5 }
[ INFO ] { AFFINITY , HYBRID_AWARE }
[ INFO ] { INFERENCE_NUM_THREADS , 0 }
[ INFO ] { PERF_COUNT , NO }
[ INFO ] { INFERENCE_PRECISION_HINT , f32 }
[ INFO ] { PERFORMANCE_HINT , THROUGHPUT }
[ INFO ] { PERFORMANCE_HINT_NUM_REQUESTS , 0 }
[Step 9/11] Creating infer requests and preparing input blobs with data
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Test Config 0
[ INFO ] images ([N,C,H,W], u8, {1, 3, 640, 640}, static): random (image is expected)
[Step 10/11] Measuring performance (Start inference asynchronously, 5 inference requests, limits: 30000 ms duration)
[ INFO ] BENCHMARK IS IN INFERENCE ONLY MODE.
[ INFO ] Input blobs will be filled once before performance measurements.
[ INFO ] First inference took 208.49 ms
[Step 11/11] Dumping statistics report
[ INFO ] Count: 390 iterations
[ INFO ] Duration: 30528.04 ms
[ INFO ] Latency:
[ INFO ] Median: 404.54 ms
[ INFO ] Average: 390.30 ms
[ INFO ] Min: 260.55 ms
[ INFO ] Max: 518.52 ms
[ INFO ] Throughput: 12.78 FPS ``` |
Beta Was this translation helpful? Give feedback.
-
openvino/src/inference/include/openvino/runtime/intel_gpu/properties.hpp Lines 136 to 140 in 1ad4a99 Does |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
What's the best way to see if GPU_HW_MATMUL is being utilized? Somehow I'm getting lower performance with INT8 with GPU. Is that normal? Is HW_MATMUL using FP16? or INT8?
Beta Was this translation helpful? Give feedback.
All reactions