pip install orpheus-cpp
You also need to install the llama-cpp-python
package separately. This is because llama-cpp-python does not ship pre-built wheels on PyPi.
Don't worry, you can just run one of the following commands:
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal
After installing orpheus-cpp
, install fastrtc
and run the following command:
python -m orpheus_cpp
Then go to http://localhost:7860
and you should see the demo.
2025-03-26_10-37-56.mp4
from orpheus_cpp import OrpheusCpp
from scipy.io.wavfile import write
orpheus = OrpheusCpp()
text = "I really hope the project deadline doesn't get moved up again."
# output is a tuple of (sample_rate (24_000), samples (numpy int16 array))
sample_rate, samples = orpheus.tts(text, options={"voice_id": "tara"})
write("output.wav", sample_rate, samples.squeeze())
for sample_rate, samples in orpheus.stream_tts_sync(text, options={"voice_id": "tara"}):
write("output.wav", sample_rate, samples.squeeze())
async for sample_rate, samples in orpheus.stream_tts(text, options={"voice_id": "tara"}):
write("output.wav", sample_rate, samples.squeeze())
By default, we wait until 1.5 seconds of audio is generated before yielding the first chunk.
This is to ensure smooth audio streaming at the cost of a longer time to first audio.
Depending on your hardware, you can try to reduce the pre_buffer_size
to get a faster time to first chunk.
async for sample_rate, samples in orpheus.stream_tts(text, options={"voice_id": "tara", "pre_buffer_size": 0.5}):
write("output.wav", sample_rate, samples.squeeze())
orpheus-cpp
is distributed under the terms of the MIT license.