|
3 | 3 | This example demonstrates how to use the `wasi-nn` crate to run a classification using the |
4 | 4 | [ONNX Runtime](https://onnxruntime.ai/) backend from a WebAssembly component. |
5 | 5 |
|
| 6 | +It supports CPU and GPU (Nvidia CUDA) execution targets. |
| 7 | + |
| 8 | +**Note:** |
| 9 | +GPU execution target only supports Nvidia CUDA (onnx-cuda) as execution provider (EP) for now. |
| 10 | + |
6 | 11 | ## Build |
| 12 | + |
7 | 13 | In this directory, run the following command to build the WebAssembly component: |
8 | 14 | ```console |
| 15 | +# build component for target wasm32-wasip1 |
9 | 16 | cargo component build |
| 17 | + |
| 18 | +# build component for target wasm32-wasip2 |
| 19 | +cargo component build --target wasm32-wasip2 |
10 | 20 | ``` |
11 | 21 |
|
| 22 | +## Running the Example |
| 23 | + |
12 | 24 | In the Wasmtime root directory, run the following command to build the Wasmtime CLI and run the WebAssembly component: |
| 25 | + |
| 26 | +### Building Wasmtime |
| 27 | + |
| 28 | +#### For CPU-only execution: |
13 | 29 | ```sh |
14 | | -# build wasmtime with component-model and WASI-NN with ONNX runtime support |
15 | 30 | cargo build --features component-model,wasi-nn,wasmtime-wasi-nn/onnx-download |
| 31 | +``` |
| 32 | + |
| 33 | +#### For GPU (Nvidia CUDA) support: |
| 34 | +```sh |
| 35 | +cargo build --features component-model,wasi-nn,wasmtime-wasi-nn/onnx-cuda,wasmtime-wasi-nn/onnx-download |
| 36 | +``` |
| 37 | + |
| 38 | +### Running with Different Execution Targets |
| 39 | + |
| 40 | +The execution target is controlled by passing a single argument to the WASM module. |
| 41 | + |
| 42 | +Arguments: |
| 43 | +- No argument or `cpu` - Use CPU execution |
| 44 | +- `gpu` or `cuda` - Use GPU/CUDA execution |
16 | 45 |
|
17 | | -# run the component with wasmtime |
| 46 | +#### CPU Execution (default): |
| 47 | +```sh |
18 | 48 | ./target/debug/wasmtime run \ |
19 | 49 | -Snn \ |
20 | 50 | --dir ./crates/wasi-nn/examples/classification-component-onnx/fixture/::fixture \ |
21 | | - ./crates/wasi-nn/examples/classification-component-onnx/target/wasm32-wasip1/debug/classification-component-onnx.wasm |
| 51 | + ./crates/wasi-nn/examples/classification-component-onnx/target/wasm32-wasip2/debug/classification-component-onnx.wasm |
22 | 52 | ``` |
23 | 53 |
|
24 | | -You should get the following output: |
| 54 | +#### GPU (CUDA) Execution: |
| 55 | +```sh |
| 56 | +# path to `libonnxruntime_providers_cuda.so` downloaded by `ort-sys` |
| 57 | +export LD_LIBRARY_PATH={wasmtime_workspace}/target/debug |
| 58 | + |
| 59 | +./target/debug/wasmtime run \ |
| 60 | + -Snn \ |
| 61 | + --dir ./crates/wasi-nn/examples/classification-component-onnx/fixture/::fixture \ |
| 62 | + ./crates/wasi-nn/examples/classification-component-onnx/target/wasm32-wasip2/debug/classification-component-onnx.wasm \ |
| 63 | + gpu |
| 64 | + |
| 65 | +``` |
| 66 | + |
| 67 | +## Expected Output |
| 68 | + |
| 69 | +You should get output similar to: |
25 | 70 | ```txt |
| 71 | +No execution target specified, defaulting to CPU |
26 | 72 | Read ONNX model, size in bytes: 4956208 |
27 | | -Loaded graph into wasi-nn |
| 73 | +Loaded graph into wasi-nn with Cpu target |
28 | 74 | Created wasi-nn execution context. |
29 | 75 | Read ONNX Labels, # of labels: 1000 |
30 | | -Set input tensor |
31 | 76 | Executed graph inference |
32 | | -Getting inferencing output |
33 | 77 | Retrieved output data with length: 4000 |
34 | 78 | Index: n02099601 golden retriever - Probability: 0.9948673 |
35 | 79 | Index: n02088094 Afghan hound, Afghan - Probability: 0.002528982 |
36 | 80 | Index: n02102318 cocker spaniel, English cocker spaniel, cocker - Probability: 0.0010986356 |
37 | 81 | ``` |
| 82 | + |
| 83 | +When using GPU target, the first line will indicate the selected execution target. |
| 84 | +You can monitor GPU usage using cmd `watch -n 1 nvidia-smi`. |
| 85 | + |
| 86 | +To see trace logs from `wasmtime_wasi_nn` or `ort`, run Wasmtime with `WASMTIME_LOG` enabled, e.g., |
| 87 | + |
| 88 | +```sh |
| 89 | +WASMTIME_LOG=wasmtime_wasi_nn=warn ./target/debug/wasmtime run ... |
| 90 | +WASMTIME_LOG=ort=warn ./target/debug/wasmtime run ... |
| 91 | +``` |
| 92 | + |
| 93 | +## Prerequisites for GPU(CUDA) Support |
| 94 | +- NVIDIA GPU with CUDA support |
| 95 | +- CUDA Toolkit 12.x with cuDNN 9.x |
| 96 | +- Build wasmtime with `wasmtime-wasi-nn/onnx-cuda` feature |
| 97 | + |
| 98 | +## ONNX Runtime's Fallback Behavior |
| 99 | + |
| 100 | +If the GPU execution provider is requested (by passing `gpu`) but the device does not have a GPU or the necessary CUDA drivers are missing, ONNX Runtime will **silently fall back** to the CPU execution provider. The application will continue to run, but inference will happen on the CPU. |
| 101 | + |
| 102 | +To verify if fallback is happening, you can enable ONNX Runtime logging: |
| 103 | + |
| 104 | +1. Build Wasmtime with the additional `wasmtime-wasi-nn/ort-tracing` feature: |
| 105 | + ```sh |
| 106 | + cargo build --features component-model,wasi-nn,wasmtime-wasi-nn/onnx-cuda,wasmtime-wasi-nn/ort-tracing |
| 107 | + ``` |
| 108 | + |
| 109 | +2. Run Wasmtime with `WASMTIME_LOG` enabled to see `ort` warnings: |
| 110 | + ```sh |
| 111 | + WASMTIME_LOG=ort=warn ./target/debug/wasmtime run ... |
| 112 | + ``` |
| 113 | + You should see a warning like: `No execution providers from session options registered successfully; may fall back to CPU.` |
0 commit comments