Skip to content

OpenVino-Genai Model Loading and Inference Using GPU, Automatic Model "Release" After Idle State, and Garbled Response During Subsequent Inference #33896

@nnbw-liu

Description

@nnbw-liu

ov-genai version: 25.1-25.4.2
Model: phi-4-mini
PC: PTL new platform device

Reproduction Steps:
1.Launch the application and perform initial inference.
2.Open Task Manager → Performance → GPU, and monitor the GPU memory usage.
3.Remain idle for 2 to 20 minutes while observing GPU memory. Perform inference again after the GPU memory usage drops significantly.
4.The issue occurs probabilistically under the above conditions. It appears occasionally in the chat sample, but consistently reproduces when using WinUI3.
This issue can only be reproduced on PTL devices, not on LNL devices.

Observation:
It appears that LNL devices do not automatically release GPU memory, whereas PTL devices exhibit automatic GPU memory release behavior.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions