Skip to content

Add MNIST-FC inference layer examples on NPU2#1451

Open
erwei-xilinx wants to merge 2 commits intoXilinx:mainfrom
erwei-xilinx:erwei/mnist-fc-examples
Open

Add MNIST-FC inference layer examples on NPU2#1451
erwei-xilinx wants to merge 2 commits intoXilinx:mainfrom
erwei-xilinx:erwei/mnist-fc-examples

Conversation

@erwei-xilinx
Copy link
Collaborator

Summary

  • Add programming_examples/mnist_fc/ with standalone examples for GGML MNIST-FC inference pipeline layers: broadcast bias add, ReLU (f32 via bf16 cmp/sel), and argmax (scalar reduction)
  • Add integration test that chains matmul(500x500x784) + fused bias+relu + bias_add + argmax in a single 4-launch multi-launch module on NPU2 (Strix, AIE2P)
  • Update operator dashboard with new "ML Pipeline" category entries

Test plan

  • All 4 sub-examples pass on NPU2 hardware (make run in each directory)
  • Integration test verifies end-to-end argmax output against numpy golden reference (exact i32 match)
  • Operator dashboard regenerated with NPU2 🟢 status for all MNIST-FC entries
  • Lit tests configured for ryzen_ai_npu2, peano

🤖 Generated with Claude Code

Add programming_examples/mnist_fc/ with standalone examples for
GGML MNIST-FC inference pipeline layers (broadcast bias add, ReLU,
argmax) and a multi-launch integration test that chains matmul +
bias_add + relu + argmax on NPU2 (Strix, AIE2P).

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@erwei-xilinx erwei-xilinx requested a review from jgmelber as a code owner March 21, 2026 05:03
Copilot AI review requested due to automatic review settings March 21, 2026 05:03
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new programming_examples/mnist_fc/ set of NPU2-focused examples that demonstrate key MNIST-FC inference pipeline layers (broadcast bias add, ReLU, argmax) plus an integration module that composes multiple launches, and updates the operator dashboard to surface them under a new “ML Pipeline” category.

Changes:

  • Add standalone MNIST-FC layer examples (broadcast bias add, 2D ReLU, argmax) with Makefile + LIT runners for NPU2/Peano.
  • Add an MNIST-FC integration example that extends the existing test54 matmul module with additional element-wise launches.
  • Update the programming examples operator dashboard entries and regenerate programming_examples/README.md.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
programming_examples/mnist_fc/relu/run_npu2_peano.lit Adds NPU2/Peano LIT runner for the ReLU example.
programming_examples/mnist_fc/relu/run.py Implements 2D ReLU layer example using bf16 compare/select path on NPU2.
programming_examples/mnist_fc/relu/Makefile Adds build/run targets for the ReLU example.
programming_examples/mnist_fc/integration/run_npu2_peano.lit Adds NPU2/Peano LIT runner for the integration example.
programming_examples/mnist_fc/integration/run.py Builds a multi-launch integration module (matmul + element-wise stages) and validates argmax vs golden reference.
programming_examples/mnist_fc/integration/Makefile Adds build/run targets for the integration example.
programming_examples/mnist_fc/broadcast_bias_add/run_npu2_peano.lit Adds NPU2/Peano LIT runner for broadcast bias add.
programming_examples/mnist_fc/broadcast_bias_add/run.py Implements broadcast bias add layer example aligned to GGML-style layout semantics.
programming_examples/mnist_fc/broadcast_bias_add/Makefile Adds build/run targets for broadcast bias add.
programming_examples/mnist_fc/argmax/run_npu2_peano.lit Adds NPU2/Peano LIT runner for argmax.
programming_examples/mnist_fc/argmax/run.py Implements row-wise argmax example (scalar reduction) with sampling-based verification.
programming_examples/mnist_fc/argmax/Makefile Adds build/run targets for argmax.
programming_examples/generate_readme.py Registers new MNIST-FC examples in the operator dashboard generation list.
programming_examples/README.md Regenerates the operator dashboard table to include new “ML Pipeline” entries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

l1_tile_in = AllocOp(l1TileTy_f32, [], [])
l1_tile_out = AllocOp(l1TileTy_f32, [], [])
l1_tile_bf16 = AllocOp(l1TileTy_bf16, [], [])
l1_tile_relu_bf16 = AllocOp(l1TileTy_bf16, [], [])
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

l1_tile_relu_bf16/l1_tile_relu_bf16 is allocated but never used (and sub_relu_bf16 is created but unused). This is dead code and wastes L1 memory; either remove these allocations/subviews or actually write the ReLU bf16 result into them if you intended to materialize the intermediate.

Suggested change
l1_tile_relu_bf16 = AllocOp(l1TileTy_bf16, [], [])

Copilot uses AI. Check for mistakes.
Comment on lines +6 to +18
# MNIST-FC inference integration test.
# Chains: matmul1(500x500x784) -> bias_add+relu -> bias_add2 -> argmax
# in a single multi-launch module on NPU2.
#
# Data layout: test 54's matmul outputs (M, N) with N contiguous.
# Bias is per-row (axis=0). Argmax reduces along axis=0.
#
# Pipeline (4 launches):
# Launch 1: matmul1 W1[K,M1] x X[K,N1] -> C1[M1,N1] (784,500 x 784,500 -> 500,500)
# Launch 2: bias_add2 C3[i,j] = matmul2_out[i,j] + bias2[i] (10x500)
# Launch 3: argmax out[j] = argmax_i(C3[i,j]) (10x500 -> 500)
# Launch 4: bias+relu C2[i,j] = max(C1[i,j]+bias1[i], 0) (500x500)
#
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file-level pipeline description says the integration test chains matmul1 -> bias_add+relu -> bias_add2 -> argmax, but the module currently does not compute matmul2 and bias_add2/argmax operate on a host-provided matmul2_out buffer. This makes the “end-to-end MNIST-FC” claim misleading; either add matmul2 into the module (and feed bias_add2 from the relu output) or update the description/PR summary to clearly state what is (and isn’t) validated.

Copilot uses AI. Check for mistakes.
Comment on lines +57 to +58
truncf_op,
block_matmul,
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

truncf_op and block_matmul are imported from test54 but never referenced in this file. If they aren’t required for side effects, remove the unused imports to avoid confusion about dependencies.

Suggested change
truncf_op,
block_matmul,

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +21
from ml_dtypes import bfloat16

Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bfloat16 is imported but never used in this example. Please remove the unused import to keep the script minimal.

Suggested change
from ml_dtypes import bfloat16

Copilot uses AI. Check for mistakes.
Comment on lines +533 to +554
# ── Phase 2: serialize vectorized matmul, extend with element-wise ──
# Get the matmul function's IR, then rebuild a module that includes
# both the matmul launch and the element-wise launches.
matmul_ir = str(matmul_module)

# Parse the vectorized matmul module, then add element-wise launches
# by modifying the function to accept additional arguments.
# Strategy: re-parse the matmul IR, add extra func arguments and
# append element-wise launches into the function body.

# We need to extend the function signature. The matmul func has:
# func @matmul_f32(%A, %B, %C) -> ()
# We need to change it to:
# func @mnist_fc(%A, %B, %C, %bias1, %relu_out, %mat2_out, %bias2, %bias2_out, %argmax_out) -> ()
# And add the element-wise launches after the matmul launch.

# Simplest approach: rebuild from scratch with the matmul IR as a string
# embedded in the new module. But this is fragile.

# Better approach: use the matmul module directly, modify its function
# to add more arguments and more launches.

Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matmul_ir = str(matmul_module) is assigned but never used (the subsequent comments also describe approaches that aren’t implemented). Please remove the unused variable/commented-out approach or complete the implementation so the code reflects the chosen strategy.

Suggested change
# ── Phase 2: serialize vectorized matmul, extend with element-wise ──
# Get the matmul function's IR, then rebuild a module that includes
# both the matmul launch and the element-wise launches.
matmul_ir = str(matmul_module)
# Parse the vectorized matmul module, then add element-wise launches
# by modifying the function to accept additional arguments.
# Strategy: re-parse the matmul IR, add extra func arguments and
# append element-wise launches into the function body.
# We need to extend the function signature. The matmul func has:
# func @matmul_f32(%A, %B, %C) -> ()
# We need to change it to:
# func @mnist_fc(%A, %B, %C, %bias1, %relu_out, %mat2_out, %bias2, %bias2_out, %argmax_out) -> ()
# And add the element-wise launches after the matmul launch.
# Simplest approach: rebuild from scratch with the matmul IR as a string
# embedded in the new module. But this is fragile.
# Better approach: use the matmul module directly, modify its function
# to add more arguments and more launches.
# ── Phase 2: extend vectorized matmul with element-wise launches ──
# We need to extend the function signature. The matmul func has:
# func @matmul_f32(%A, %B, %C) -> ()
# We need to change it to:
# func @mnist_fc(%A, %B, %C, %bias1, %relu_out, %mat2_out, %bias2, %bias2_out, %argmax_out) -> ()
# And add the element-wise launches after the matmul launch.
# Use the matmul module directly, modifying its function to add more
# arguments and additional launches instead of reparsing IR as a string.

Copilot uses AI. Check for mistakes.
- Remove unused l1_tile_relu_bf16 allocation and sub_relu_bf16 subview in relu
- Remove unused truncf_op, block_matmul imports in integration
- Remove unused bfloat16 import in argmax
- Remove unused matmul_ir variable and stale comments in integration
- Clarify integration description: matmul2 output is host-provided

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants