Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ChatTTS example #149

Merged
merged 5 commits into from
Aug 9, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions wasmedge-chatTTS/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
asset
config
9 changes: 9 additions & 0 deletions wasmedge-chatTTS/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
[package]
name = "wasmedge-chattts"
version = "0.1.0"
edition = "2021"

[dependencies]
serde_json = "1.0"
wasmedge-wasi-nn = {path = "../../wasmedge-wasi-nn/rust", version = "0.8.0"}
hound = "3.4"
69 changes: 69 additions & 0 deletions wasmedge-chatTTS/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# ChatTTS example with WasmEdge WASI-NN ChatTTS plugin
This example demonstrates how to use the WasmEdge WASI-NN ChatTTS plugin to generate speech from text. ChatTTS is a text-to-speech model designed specifically for dialogue scenarios such as LLM assistant. This example will use the WasmEdge WASI-NN ChatTTS plugin to run the ChatTTS to generate speech.

## Install WasmEdge with WASI-NN ChatTTS plugin
The ChatTTS backend relies on ChatTTS and Python library, we recommend the following commands to install the dependencies.
``` bash
sudo apt update
sudo apt upgrade
sudo apt install python3-dev
pip install chattts==0.1.1
```

Then build and install WasmEdge from source.

``` bash
cd <path/to/your/wasmedge/source/folder>

cmake -GNinja -Bbuild -DCMAKE_BUILD_TYPE=Release -DWASMEDGE_PLUGIN_WASI_NN_BACKEND="chatTTS"
cmake --build build

# For the WASI-NN plugin, you should install this project.
cmake --install build
```

Then you will have an executable `wasmedge` runtime under `/usr/local/bin` and the WASI-NN with Neural Speed backend plug-in under `/usr/local/lib/wasmedge/libwasmedgePluginWasiNN.so` after installation.

## Build wasm

Run the following command to build wasm, the output WASM file will be at `target/wasm32-wasi/release/`

```bash
cargo build --target wasm32-wasi --release
```

## Execute

Execute the WASM with the `wasmedge`.

``` bash
wasmedge --dir .:. ./target/wasm32-wasi/release/wasmedge-chattts.wasm
```

Then you will generate the `output1.wav` file. It is the wav file of the input text.

## Advanced Options

The `config_data` is used to adjust the configuration of the ChatTTS.
Supports the following options:
- `prompt`: Generate the special token in the text to synthesize.
- `spk_emb`: Sampled speaker (Using `random` for random speaker).
- `temperature`: Custom temperature.
- `top_k`: Top P decode.
- `top_p`: Top K decode.

``` rust
let config_data = serde_json::to_string(&json!({"prompt": "[oral_2][laugh_0][break_6]", "spk_emb": "random", "temperature": 0.5, "top_k": 0, "top_p": 0.9}))
.unwrap()
.as_bytes()
.to_vec();
```
<table>
<tr>
<td>

[demo.webm](https://github.com/user-attachments/assets/377e0487-9107-41db-9c22-31962ce53f88)

</td>
</tr>
</table>
Binary file added wasmedge-chatTTS/assets/demo.wav
Binary file not shown.
Binary file added wasmedge-chatTTS/assets/demo.webm
Binary file not shown.
55 changes: 55 additions & 0 deletions wasmedge-chatTTS/src/main.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
use hound;
use serde_json::json;
use wasmedge_wasi_nn::{
self, ExecutionTarget, GraphBuilder, GraphEncoding, GraphExecutionContext, TensorType,
};

fn get_data_from_context(context: &GraphExecutionContext, index: usize) -> Vec<u8> {
const MAX_OUTPUT_BUFFER_SIZE: usize = 4096 * 4096;
let mut output_buffer = vec![0u8; MAX_OUTPUT_BUFFER_SIZE];
let bytes_written = context
.get_output(index, &mut output_buffer)
.expect("Failed to get output");

return output_buffer[..bytes_written].to_vec();
}

fn main() {
let prompt = "It is test sentence [uv_break] for chat T T S.";
let tensor_data = prompt.as_bytes().to_vec();
let config_data = serde_json::to_string(&json!({"prompt": "[oral_2][laugh_0][break_6]", "spk_emb": "random", "temperature": 0.5, "top_k": 0, "top_p": 0.9}))
.unwrap()
.as_bytes()
.to_vec();
let empty_vec: Vec<Vec<u8>> = Vec::new();
let graph = GraphBuilder::new(GraphEncoding::ChatTTS, ExecutionTarget::CPU)
.build_from_bytes(empty_vec)
.expect("Failed to build graph");
let mut context = graph
.init_execution_context()
.expect("Failed to init context");
context
.set_input(0, TensorType::U8, &[1], &tensor_data)
.expect("Failed to set input");
context
.set_input(1, TensorType::U8, &[1], &config_data)
.expect("Failed to set input");
context.compute().expect("Failed to compute");
let output_bytes = get_data_from_context(&context, 0);
let spec = hound::WavSpec {
channels: 1,
sample_rate: 24000,
bits_per_sample: 32,
sample_format: hound::SampleFormat::Float,
};
let mut writer = hound::WavWriter::create("output1.wav", spec).unwrap();
let samples: Vec<f32> = output_bytes
.chunks_exact(4)
.map(|b| f32::from_le_bytes([b[0], b[1], b[2], b[3]]))
.collect();
for sample in samples {
writer.write_sample(sample).unwrap();
}
writer.finalize().unwrap();
graph.unload().expect("Failed to free resource");
}
Binary file added wasmedge-chatTTS/wasmedge-chattts.wasm
Binary file not shown.