Add MNIST-FC inference layer examples on NPU2#1451
Add MNIST-FC inference layer examples on NPU2#1451erwei-xilinx wants to merge 2 commits intoXilinx:mainfrom
Conversation
Add programming_examples/mnist_fc/ with standalone examples for GGML MNIST-FC inference pipeline layers (broadcast bias add, ReLU, argmax) and a multi-launch integration test that chains matmul + bias_add + relu + argmax on NPU2 (Strix, AIE2P). Co-Authored-By: Claude Opus 4.6 <[email protected]>
There was a problem hiding this comment.
Pull request overview
Adds a new programming_examples/mnist_fc/ set of NPU2-focused examples that demonstrate key MNIST-FC inference pipeline layers (broadcast bias add, ReLU, argmax) plus an integration module that composes multiple launches, and updates the operator dashboard to surface them under a new “ML Pipeline” category.
Changes:
- Add standalone MNIST-FC layer examples (broadcast bias add, 2D ReLU, argmax) with Makefile + LIT runners for NPU2/Peano.
- Add an MNIST-FC integration example that extends the existing test54 matmul module with additional element-wise launches.
- Update the programming examples operator dashboard entries and regenerate
programming_examples/README.md.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| programming_examples/mnist_fc/relu/run_npu2_peano.lit | Adds NPU2/Peano LIT runner for the ReLU example. |
| programming_examples/mnist_fc/relu/run.py | Implements 2D ReLU layer example using bf16 compare/select path on NPU2. |
| programming_examples/mnist_fc/relu/Makefile | Adds build/run targets for the ReLU example. |
| programming_examples/mnist_fc/integration/run_npu2_peano.lit | Adds NPU2/Peano LIT runner for the integration example. |
| programming_examples/mnist_fc/integration/run.py | Builds a multi-launch integration module (matmul + element-wise stages) and validates argmax vs golden reference. |
| programming_examples/mnist_fc/integration/Makefile | Adds build/run targets for the integration example. |
| programming_examples/mnist_fc/broadcast_bias_add/run_npu2_peano.lit | Adds NPU2/Peano LIT runner for broadcast bias add. |
| programming_examples/mnist_fc/broadcast_bias_add/run.py | Implements broadcast bias add layer example aligned to GGML-style layout semantics. |
| programming_examples/mnist_fc/broadcast_bias_add/Makefile | Adds build/run targets for broadcast bias add. |
| programming_examples/mnist_fc/argmax/run_npu2_peano.lit | Adds NPU2/Peano LIT runner for argmax. |
| programming_examples/mnist_fc/argmax/run.py | Implements row-wise argmax example (scalar reduction) with sampling-based verification. |
| programming_examples/mnist_fc/argmax/Makefile | Adds build/run targets for argmax. |
| programming_examples/generate_readme.py | Registers new MNIST-FC examples in the operator dashboard generation list. |
| programming_examples/README.md | Regenerates the operator dashboard table to include new “ML Pipeline” entries. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| l1_tile_in = AllocOp(l1TileTy_f32, [], []) | ||
| l1_tile_out = AllocOp(l1TileTy_f32, [], []) | ||
| l1_tile_bf16 = AllocOp(l1TileTy_bf16, [], []) | ||
| l1_tile_relu_bf16 = AllocOp(l1TileTy_bf16, [], []) |
There was a problem hiding this comment.
l1_tile_relu_bf16/l1_tile_relu_bf16 is allocated but never used (and sub_relu_bf16 is created but unused). This is dead code and wastes L1 memory; either remove these allocations/subviews or actually write the ReLU bf16 result into them if you intended to materialize the intermediate.
| l1_tile_relu_bf16 = AllocOp(l1TileTy_bf16, [], []) |
| # MNIST-FC inference integration test. | ||
| # Chains: matmul1(500x500x784) -> bias_add+relu -> bias_add2 -> argmax | ||
| # in a single multi-launch module on NPU2. | ||
| # | ||
| # Data layout: test 54's matmul outputs (M, N) with N contiguous. | ||
| # Bias is per-row (axis=0). Argmax reduces along axis=0. | ||
| # | ||
| # Pipeline (4 launches): | ||
| # Launch 1: matmul1 W1[K,M1] x X[K,N1] -> C1[M1,N1] (784,500 x 784,500 -> 500,500) | ||
| # Launch 2: bias_add2 C3[i,j] = matmul2_out[i,j] + bias2[i] (10x500) | ||
| # Launch 3: argmax out[j] = argmax_i(C3[i,j]) (10x500 -> 500) | ||
| # Launch 4: bias+relu C2[i,j] = max(C1[i,j]+bias1[i], 0) (500x500) | ||
| # |
There was a problem hiding this comment.
The file-level pipeline description says the integration test chains matmul1 -> bias_add+relu -> bias_add2 -> argmax, but the module currently does not compute matmul2 and bias_add2/argmax operate on a host-provided matmul2_out buffer. This makes the “end-to-end MNIST-FC” claim misleading; either add matmul2 into the module (and feed bias_add2 from the relu output) or update the description/PR summary to clearly state what is (and isn’t) validated.
| truncf_op, | ||
| block_matmul, |
There was a problem hiding this comment.
truncf_op and block_matmul are imported from test54 but never referenced in this file. If they aren’t required for side effects, remove the unused imports to avoid confusion about dependencies.
| truncf_op, | |
| block_matmul, |
| from ml_dtypes import bfloat16 | ||
|
|
There was a problem hiding this comment.
bfloat16 is imported but never used in this example. Please remove the unused import to keep the script minimal.
| from ml_dtypes import bfloat16 |
| # ── Phase 2: serialize vectorized matmul, extend with element-wise ── | ||
| # Get the matmul function's IR, then rebuild a module that includes | ||
| # both the matmul launch and the element-wise launches. | ||
| matmul_ir = str(matmul_module) | ||
|
|
||
| # Parse the vectorized matmul module, then add element-wise launches | ||
| # by modifying the function to accept additional arguments. | ||
| # Strategy: re-parse the matmul IR, add extra func arguments and | ||
| # append element-wise launches into the function body. | ||
|
|
||
| # We need to extend the function signature. The matmul func has: | ||
| # func @matmul_f32(%A, %B, %C) -> () | ||
| # We need to change it to: | ||
| # func @mnist_fc(%A, %B, %C, %bias1, %relu_out, %mat2_out, %bias2, %bias2_out, %argmax_out) -> () | ||
| # And add the element-wise launches after the matmul launch. | ||
|
|
||
| # Simplest approach: rebuild from scratch with the matmul IR as a string | ||
| # embedded in the new module. But this is fragile. | ||
|
|
||
| # Better approach: use the matmul module directly, modify its function | ||
| # to add more arguments and more launches. | ||
|
|
There was a problem hiding this comment.
matmul_ir = str(matmul_module) is assigned but never used (the subsequent comments also describe approaches that aren’t implemented). Please remove the unused variable/commented-out approach or complete the implementation so the code reflects the chosen strategy.
| # ── Phase 2: serialize vectorized matmul, extend with element-wise ── | |
| # Get the matmul function's IR, then rebuild a module that includes | |
| # both the matmul launch and the element-wise launches. | |
| matmul_ir = str(matmul_module) | |
| # Parse the vectorized matmul module, then add element-wise launches | |
| # by modifying the function to accept additional arguments. | |
| # Strategy: re-parse the matmul IR, add extra func arguments and | |
| # append element-wise launches into the function body. | |
| # We need to extend the function signature. The matmul func has: | |
| # func @matmul_f32(%A, %B, %C) -> () | |
| # We need to change it to: | |
| # func @mnist_fc(%A, %B, %C, %bias1, %relu_out, %mat2_out, %bias2, %bias2_out, %argmax_out) -> () | |
| # And add the element-wise launches after the matmul launch. | |
| # Simplest approach: rebuild from scratch with the matmul IR as a string | |
| # embedded in the new module. But this is fragile. | |
| # Better approach: use the matmul module directly, modify its function | |
| # to add more arguments and more launches. | |
| # ── Phase 2: extend vectorized matmul with element-wise launches ── | |
| # We need to extend the function signature. The matmul func has: | |
| # func @matmul_f32(%A, %B, %C) -> () | |
| # We need to change it to: | |
| # func @mnist_fc(%A, %B, %C, %bias1, %relu_out, %mat2_out, %bias2, %bias2_out, %argmax_out) -> () | |
| # And add the element-wise launches after the matmul launch. | |
| # Use the matmul module directly, modifying its function to add more | |
| # arguments and additional launches instead of reparsing IR as a string. |
- Remove unused l1_tile_relu_bf16 allocation and sub_relu_bf16 subview in relu - Remove unused truncf_op, block_matmul imports in integration - Remove unused bfloat16 import in argmax - Remove unused matmul_ir variable and stale comments in integration - Clarify integration description: matmul2 output is host-provided Co-Authored-By: Claude Opus 4.6 <[email protected]>
Summary
programming_examples/mnist_fc/with standalone examples for GGML MNIST-FC inference pipeline layers: broadcast bias add, ReLU (f32 via bf16 cmp/sel), and argmax (scalar reduction)Test plan
make runin each directory)ryzen_ai_npu2, peano🤖 Generated with Claude Code