Skip to content

Commit 83bb521

Browse files
dm4juntao
andauthored
[Examples] Add wasmedge-ggml-llama examples (#28)
* [Examples] Add wasmedge-ggml-llama examples Signed-off-by: dm4 <[email protected]> * Create llama.yml Add a CI check * Update llama.yml * Update llama.yml * Update llama.yml --------- Signed-off-by: dm4 <[email protected]> Co-authored-by: Michael Yuan <[email protected]>
1 parent ca0bf34 commit 83bb521

File tree

5 files changed

+141
-0
lines changed

5 files changed

+141
-0
lines changed

.github/workflows/llama.yml

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
name: Build and Test llama2 examples
2+
3+
on:
4+
workflow_dispatch:
5+
inputs:
6+
logLevel:
7+
description: 'Log level'
8+
required: true
9+
default: 'info'
10+
push:
11+
branches: [ '*' ]
12+
pull_request:
13+
branches: [ '*' ]
14+
15+
jobs:
16+
build:
17+
18+
runs-on: ubuntu-20.04
19+
20+
steps:
21+
- uses: actions/checkout@v2
22+
23+
- name: Install apt-get packages
24+
run: |
25+
sudo ACCEPT_EULA=Y apt-get update
26+
sudo ACCEPT_EULA=Y apt-get upgrade
27+
sudo apt-get install wget git curl software-properties-common build-essential
28+
29+
- name: Install Rust target for wasm
30+
run: |
31+
rustup target add wasm32-wasi
32+
33+
- name: Install WasmEdge + WASI-NN + GGML
34+
run: |
35+
VERSION=0.13.4
36+
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | sudo bash -s -- -v $VERSION --plugins wasi_nn-ggml -p /usr/local
37+
38+
- name: Example
39+
run: |
40+
cd wasmedge-ggml-llama
41+
curl -LO https://huggingface.co/TheBloke/orca_mini_3B-GGML/resolve/main/orca-mini-3b.ggmlv3.q4_0.bin
42+
cargo build --target wasm32-wasi --release
43+
wasmedge compile target/wasm32-wasi/release/wasmedge-ggml-llama.wasm wasmedge-ggml-llama.wasm
44+
wasmedge --dir .:. --nn-preload default:GGML:CPU:orca-mini-3b.ggmlv3.q4_0.bin wasmedge-ggml-llama.wasm default 'Once upon a time, '

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,5 +8,7 @@ openvino-mobilenet-raw/mobilenet.bin
88
openvino-mobilenet-raw/mobilenet.xml
99
openvino-mobilenet-raw/tensor-1x224x224x3-f32.bgr
1010

11+
wasmedge-ggml-llama/llama-2-7b-chat.ggmlv3.q4_0.bin
12+
1113
.DS_Store
1214
Cargo.lock

wasmedge-ggml-llama/Cargo.toml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
[package]
2+
name = "wasmedge-ggml-llama"
3+
version = "0.1.0"
4+
edition = "2021"
5+
6+
[dependencies]
7+
wasi-nn = { git = "https://github.com/second-state/wasmedge-wasi-nn", branch = "dm4/ggml" }

wasmedge-ggml-llama/README.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# Llama Example For WASI-NN with GGML Backend
2+
3+
## Dependencies
4+
5+
Install the latest wasmedge with plugins:
6+
7+
```bash
8+
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugins wasi_nn-ggml
9+
```
10+
11+
## Build
12+
13+
Compile the application to WebAssembly:
14+
15+
```bash
16+
cargo build --target wasm32-wasi --release
17+
```
18+
19+
The output WASM file will be at `target/wasm32-wasi/release/`.
20+
To speed up the image processing, we can enable the AOT mode in WasmEdge with:
21+
22+
```bash
23+
wasmedgec target/wasm32-wasi/release/wasmedge-ggml-llama.wasm wasmedge-ggml-llama-aot.wasm
24+
```
25+
26+
## Get Model
27+
28+
Download llama model:
29+
30+
```bash
31+
curl -LO https://huggingface.co/localmodels/Llama-2-7B-Chat-ggml/resolve/main/llama-2-7b-chat.ggmlv3.q4_0.bin
32+
```
33+
34+
### Execute
35+
36+
Execute the WASM with the `wasmedge` using the named model feature to preload large model:
37+
38+
```bash
39+
wasmedge --dir .:. \
40+
--nn-preload default:GGML:CPU:llama-2-7b-chat.ggmlv3.q4_0.bin \
41+
wasmedge-ggml-llama-aot.wasm default 'Once upon a time, '
42+
```
43+
44+
After executing the command, it takes some time to wait for the output.
45+
Once the execution is complete, the following output will be generated:
46+
47+
```console
48+
Loaded model into wasi-nn with ID: 0
49+
Created wasi-nn execution context with ID: 0
50+
Read input tensor, size in bytes: 18
51+
Executed model inference
52+
Output: Once upon a time, 100 years ago, there was a small village nestled in the rolling hills of the countryside. Unterscheidung between the two is not always clear-cut, and both terms are often used interchangeably. The village was home to a small community of people who lived simple lives, relying on the land for their livelihood. The villagers were known for their kindness, generosity, and strong sense of community. They worked together to cultivate the land, grow their own food, and raise their children. The village was a peaceful place, where everyone knew and looked out for each other.
53+
54+
However, as time passed, the village began to change. New technologies and innovations emerged, and the villagers found themselves adapting to a rapidly changing world. Some embraced the changes, while others resisted them. The village became more connected to the outside world, and the villagers began to interact with people from other places. The village was no longer isolated, and the villagers were
55+
```

wasmedge-ggml-llama/src/main.rs

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
use std::env;
2+
use wasi_nn;
3+
4+
fn main() {
5+
let args: Vec<String> = env::args().collect();
6+
let model_name: &str = &args[1];
7+
let prompt: &str = &args[2];
8+
9+
let graph =
10+
wasi_nn::GraphBuilder::new(wasi_nn::GraphEncoding::Ggml, wasi_nn::ExecutionTarget::CPU)
11+
.build_from_cache(model_name)
12+
.unwrap();
13+
println!("Loaded model into wasi-nn with ID: {:?}", graph);
14+
15+
let mut context = graph.init_execution_context().unwrap();
16+
println!("Created wasi-nn execution context with ID: {:?}", context);
17+
18+
let tensor_data = prompt.as_bytes().to_vec();
19+
println!("Read input tensor, size in bytes: {}", tensor_data.len());
20+
context
21+
.set_input(0, wasi_nn::TensorType::U8, &[1], &tensor_data)
22+
.unwrap();
23+
24+
// Execute the inference.
25+
context.compute().unwrap();
26+
println!("Executed model inference");
27+
28+
// Retrieve the output.
29+
let mut output_buffer = vec![0u8; 1000];
30+
context.get_output(0, &mut output_buffer).unwrap();
31+
let output = String::from_utf8(output_buffer.clone()).unwrap();
32+
println!("Output: {}", output);
33+
}

0 commit comments

Comments
 (0)