[WebGPU] Unexpected Output with Phi-3 Mini 4K Instruct Model from ORT GenAI

### Describe the issue

WebNN developer preview provides a [text-generation](https://github.com/microsoft/webnn-developer-preview/tree/main/demos/text-generation) demo with some LLM models ([Phi-3 Mini 4K Instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx/tree/main/directml/directml-int4-awq-block-128), [DeepSeek R1 Distill Qwen](https://huggingface.co/onnxruntime/DeepSeek-R1-Distill-ONNX/tree/main/deepseek-r1-distill-qwen-1.5B/gpu/gpu-int4-rtn-block-32), [TinyLLama](https://huggingface.co/webnn/TinyLlama-1.1B-Chat-v1.0-onnx/tree/main/onnx), [QWen2](https://huggingface.co/webnn/Qwen2-0.5B-Instruct-onnx/tree/main/onnx)) which are generated from [ONNXRuntime GenAI](https://github.com/microsoft/onnxruntime-genai).

These models have the similar architecture (GQA, MatMulNBits, RotaryEmbedding...), when tested them with WebGPU EP, only the Phi-3 Mini 4K Instruct got unexpected result. other models worked fine.
![Image](https://github.com/user-attachments/assets/04054044-4fae-4970-af80-045bb8702921)

![Image](https://github.com/user-attachments/assets/8aad5419-a055-4fef-81e8-be890dbeb630)

### To reproduce

Test Phi-3 Mini 4K Instruct:
- Access https://microsoft.github.io/webnn-developer-preview/demos/text-generation/?provider=webgpu&model=phi3mini&ort=latest from either Edge or Chrome browser
- Wait a moment for the model loading
- Type question in the input box

Test others:
- TinyLlama: https://microsoft.github.io/webnn-developer-preview/demos/text-generation/?provider=webgpu&model=tinyllama&ort=latest
- QWen2: https://microsoft.github.io/webnn-developer-preview/demos/text-generation/?provider=webgpu&model=qwen2&ort=latest
- DeepSeek R1 Distill Qwen: https://microsoft.github.io/webnn-developer-preview/demos/text-generation/?provider=webgpu&model=deepseekr1&ort=latest



### Urgency

_No response_

### ONNX Runtime Installation

Released Package

### ONNX Runtime Version or Commit ID

1.23.0-dev.20250612-70f14d7670

### Execution Provider

'webgpu' (WebGPU)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WebGPU] Unexpected Output with Phi-3 Mini 4K Instruct Model from ORT GenAI #25180

Describe the issue

To reproduce

Urgency

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

Execution Provider

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

[WebGPU] Unexpected Output with Phi-3 Mini 4K Instruct Model from ORT GenAI #25180

Description

Describe the issue

To reproduce

Urgency

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

Execution Provider

Activity

Honry commented on Jun 26, 2025

fdwr commented on Jun 26, 2025

Honry commented on Jun 26, 2025

fs-eire commented on Jul 2, 2025

Honry commented on Jul 2, 2025

fs-eire commented on Jul 2, 2025

guschmue commented on Jul 3, 2025

Honry commented on Jul 4, 2025

Honry commented on Jul 8, 2025

qjia7 commented on Jul 9, 2025

Honry commented on Jul 9, 2025

guschmue commented on Jul 9, 2025

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions