Open
Description
This is the basis of an oncoming paper (to be published in RBVM 2025). Its main goal is designing a chat application that involves audio synthesis and recognition and an LLM-based completion device. It is meant to leverage new YARP interfaces (#31). Components:
- Speech synthesis (TTS):
yarp::dev::ISpeechSynthesizer
implemented with Piper - Speech recognition (ASR):
yarp::dev::ISpeechTranscription
implemented with Vosk- includes wake word detection implemented with openWakeWord
- Large language model (LLM):
yarp::dev::ILLM
implemented with llama.cpp
A very similar idea has been presented in robotology/yarp-devices-llm involving the OpenAI API through Azure servers (perhaps TTS/ASR is also online). However, this app is meant to be run fully offline. As such, its main motivation is introducing the recent DeepSeek models in the LLM component.
Metadata
Metadata
Assignees
Labels
No labels