Add MNIST-FC inference layer examples on NPU2 by erwei-xilinx · Pull Request #1451 · Xilinx/mlir-air

erwei-xilinx · 2026-03-21T05:03:35Z

Summary

Add programming_examples/mnist_fc/ with standalone examples for GGML MNIST-FC inference pipeline layers: broadcast bias add, ReLU (f32 via bf16 cmp/sel), and argmax (scalar reduction)
Add integration test that chains matmul(500x500x784) + fused bias+relu + bias_add + argmax in a single 4-launch multi-launch module on NPU2 (Strix, AIE2P)
Update operator dashboard with new "ML Pipeline" category entries

Test plan

All 4 sub-examples pass on NPU2 hardware (make run in each directory)
Integration test verifies end-to-end argmax output against numpy golden reference (exact i32 match)
Operator dashboard regenerated with NPU2 🟢 status for all MNIST-FC entries
Lit tests configured for ryzen_ai_npu2, peano

🤖 Generated with Claude Code

Add programming_examples/mnist_fc/ with standalone examples for GGML MNIST-FC inference pipeline layers (broadcast bias add, ReLU, argmax) and a multi-launch integration test that chains matmul + bias_add + relu + argmax on NPU2 (Strix, AIE2P). Co-Authored-By: Claude Opus 4.6 <[email protected]>

Copilot

Pull request overview

Adds a new programming_examples/mnist_fc/ set of NPU2-focused examples that demonstrate key MNIST-FC inference pipeline layers (broadcast bias add, ReLU, argmax) plus an integration module that composes multiple launches, and updates the operator dashboard to surface them under a new “ML Pipeline” category.

Changes:

Add standalone MNIST-FC layer examples (broadcast bias add, 2D ReLU, argmax) with Makefile + LIT runners for NPU2/Peano.
Add an MNIST-FC integration example that extends the existing test54 matmul module with additional element-wise launches.
Update the programming examples operator dashboard entries and regenerate programming_examples/README.md.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
programming_examples/mnist_fc/relu/run_npu2_peano.lit	Adds NPU2/Peano LIT runner for the ReLU example.
programming_examples/mnist_fc/relu/run.py	Implements 2D ReLU layer example using bf16 compare/select path on NPU2.
programming_examples/mnist_fc/relu/Makefile	Adds build/run targets for the ReLU example.
programming_examples/mnist_fc/integration/run_npu2_peano.lit	Adds NPU2/Peano LIT runner for the integration example.
programming_examples/mnist_fc/integration/run.py	Builds a multi-launch integration module (matmul + element-wise stages) and validates argmax vs golden reference.
programming_examples/mnist_fc/integration/Makefile	Adds build/run targets for the integration example.
programming_examples/mnist_fc/broadcast_bias_add/run_npu2_peano.lit	Adds NPU2/Peano LIT runner for broadcast bias add.
programming_examples/mnist_fc/broadcast_bias_add/run.py	Implements broadcast bias add layer example aligned to GGML-style layout semantics.
programming_examples/mnist_fc/broadcast_bias_add/Makefile	Adds build/run targets for broadcast bias add.
programming_examples/mnist_fc/argmax/run_npu2_peano.lit	Adds NPU2/Peano LIT runner for argmax.
programming_examples/mnist_fc/argmax/run.py	Implements row-wise argmax example (scalar reduction) with sampling-based verification.
programming_examples/mnist_fc/argmax/Makefile	Adds build/run targets for argmax.
programming_examples/generate_readme.py	Registers new MNIST-FC examples in the operator dashboard generation list.
programming_examples/README.md	Regenerates the operator dashboard table to include new “ML Pipeline” entries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-21T05:09:24Z

programming_examples/mnist_fc/relu/run.py

+                    l1_tile_in = AllocOp(l1TileTy_f32, [], [])
+                    l1_tile_out = AllocOp(l1TileTy_f32, [], [])
+                    l1_tile_bf16 = AllocOp(l1TileTy_bf16, [], [])
+                    l1_tile_relu_bf16 = AllocOp(l1TileTy_bf16, [], [])


l1_tile_relu_bf16/l1_tile_relu_bf16 is allocated but never used (and sub_relu_bf16 is created but unused). This is dead code and wastes L1 memory; either remove these allocations/subviews or actually write the ReLU bf16 result into them if you intended to materialize the intermediate.

Suggested change

l1_tile_relu_bf16 = AllocOp(l1TileTy_bf16, [], [])

Copilot · 2026-03-21T05:09:24Z

programming_examples/mnist_fc/integration/run.py

+# MNIST-FC inference integration test.
+# Chains: matmul1(500x500x784) -> bias_add+relu -> bias_add2 -> argmax
+# in a single multi-launch module on NPU2.
+#
+# Data layout: test 54's matmul outputs (M, N) with N contiguous.
+# Bias is per-row (axis=0). Argmax reduces along axis=0.
+#
+# Pipeline (4 launches):
+#   Launch 1: matmul1      W1[K,M1] x X[K,N1] -> C1[M1,N1]     (784,500 x 784,500 -> 500,500)
+#   Launch 2: bias_add2    C3[i,j] = matmul2_out[i,j] + bias2[i] (10x500)
+#   Launch 3: argmax       out[j]  = argmax_i(C3[i,j])            (10x500 -> 500)
+#   Launch 4: bias+relu    C2[i,j] = max(C1[i,j]+bias1[i], 0)    (500x500)
+#


The file-level pipeline description says the integration test chains matmul1 -> bias_add+relu -> bias_add2 -> argmax, but the module currently does not compute matmul2 and bias_add2/argmax operate on a host-provided matmul2_out buffer. This makes the “end-to-end MNIST-FC” claim misleading; either add matmul2 into the module (and feed bias_add2 from the relu output) or update the description/PR summary to clearly state what is (and isn’t) validated.

Copilot · 2026-03-21T05:09:24Z

programming_examples/mnist_fc/integration/run.py

+    truncf_op,
+    block_matmul,


truncf_op and block_matmul are imported from test54 but never referenced in this file. If they aren’t required for side effects, remove the unused imports to avoid confusion about dependencies.

Suggested change

truncf_op,

block_matmul,

Copilot · 2026-03-21T05:09:24Z

programming_examples/mnist_fc/argmax/run.py

+from ml_dtypes import bfloat16
+


bfloat16 is imported but never used in this example. Please remove the unused import to keep the script minimal.

Suggested change

from ml_dtypes import bfloat16

Copilot · 2026-03-21T05:09:24Z

programming_examples/mnist_fc/integration/run.py

+    # ── Phase 2: serialize vectorized matmul, extend with element-wise ──
+    # Get the matmul function's IR, then rebuild a module that includes
+    # both the matmul launch and the element-wise launches.
+    matmul_ir = str(matmul_module)
+
+    # Parse the vectorized matmul module, then add element-wise launches
+    # by modifying the function to accept additional arguments.
+    # Strategy: re-parse the matmul IR, add extra func arguments and
+    # append element-wise launches into the function body.
+
+    # We need to extend the function signature. The matmul func has:
+    #   func @matmul_f32(%A, %B, %C) -> ()
+    # We need to change it to:
+    #   func @mnist_fc(%A, %B, %C, %bias1, %relu_out, %mat2_out, %bias2, %bias2_out, %argmax_out) -> ()
+    # And add the element-wise launches after the matmul launch.
+
+    # Simplest approach: rebuild from scratch with the matmul IR as a string
+    # embedded in the new module. But this is fragile.
+
+    # Better approach: use the matmul module directly, modify its function
+    # to add more arguments and more launches.
+


matmul_ir = str(matmul_module) is assigned but never used (the subsequent comments also describe approaches that aren’t implemented). Please remove the unused variable/commented-out approach or complete the implementation so the code reflects the chosen strategy.

Suggested change

# ── Phase 2: serialize vectorized matmul, extend with element-wise ──

# Get the matmul function's IR, then rebuild a module that includes

# both the matmul launch and the element-wise launches.

matmul_ir = str(matmul_module)

# Parse the vectorized matmul module, then add element-wise launches

# by modifying the function to accept additional arguments.

# Strategy: re-parse the matmul IR, add extra func arguments and

# append element-wise launches into the function body.

# We need to extend the function signature. The matmul func has:

# func @matmul_f32(%A, %B, %C) -> ()

# We need to change it to:

# func @mnist_fc(%A, %B, %C, %bias1, %relu_out, %mat2_out, %bias2, %bias2_out, %argmax_out) -> ()

# And add the element-wise launches after the matmul launch.

# Simplest approach: rebuild from scratch with the matmul IR as a string

# embedded in the new module. But this is fragile.

# Better approach: use the matmul module directly, modify its function

# to add more arguments and more launches.

# ── Phase 2: extend vectorized matmul with element-wise launches ──

# We need to extend the function signature. The matmul func has:

# func @matmul_f32(%A, %B, %C) -> ()

# We need to change it to:

# func @mnist_fc(%A, %B, %C, %bias1, %relu_out, %mat2_out, %bias2, %bias2_out, %argmax_out) -> ()

# And add the element-wise launches after the matmul launch.

# Use the matmul module directly, modifying its function to add more

# arguments and additional launches instead of reparsing IR as a string.

- Remove unused l1_tile_relu_bf16 allocation and sub_relu_bf16 subview in relu - Remove unused truncf_op, block_matmul imports in integration - Remove unused bfloat16 import in argmax - Remove unused matmul_ir variable and stale comments in integration - Clarify integration description: matmul2 output is host-provided Co-Authored-By: Claude Opus 4.6 <[email protected]>

erwei-xilinx requested a review from jgmelber as a code owner March 21, 2026 05:03

Copilot AI review requested due to automatic review settings March 21, 2026 05:03

Copilot started reviewing on behalf of erwei-xilinx March 21, 2026 05:05 View session

Copilot AI reviewed Mar 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MNIST-FC inference layer examples on NPU2#1451

Add MNIST-FC inference layer examples on NPU2#1451
erwei-xilinx wants to merge 2 commits intoXilinx:mainfrom
erwei-xilinx:erwei/mnist-fc-examples

erwei-xilinx commented Mar 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 21, 2026

Uh oh!

Copilot AI Mar 21, 2026

Uh oh!

Copilot AI Mar 21, 2026

Uh oh!

Copilot AI Mar 21, 2026

Uh oh!

Copilot AI Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

erwei-xilinx commented Mar 21, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants