OpenVino-Genai Model Loading and Inference Using GPU, Automatic Model "Release" After Idle State, and Garbled Response During Subsequent Inference

ov-genai version: 25.1-25.4.2
Model: phi-4-mini
PC: PTL new platform device

Reproduction Steps:
1.Launch the application and perform initial inference.
2.Open Task Manager → Performance → GPU, and monitor the GPU memory usage.
3.Remain idle for 2 to 20 minutes while observing GPU memory. Perform inference again after the GPU memory usage drops significantly.
4.The issue occurs probabilistically under the above conditions. It appears occasionally in the chat sample, but consistently reproduces when using WinUI3.
This issue can only be reproduced on PTL devices, not on LNL devices.

Observation:
It appears that LNL devices do not automatically release GPU memory, whereas PTL devices exhibit automatic GPU memory release behavior.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenVino-Genai Model Loading and Inference Using GPU, Automatic Model "Release" After Idle State, and Garbled Response During Subsequent Inference #33896

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OpenVino-Genai Model Loading and Inference Using GPU, Automatic Model "Release" After Idle State, and Garbled Response During Subsequent Inference #33896

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions