Skip to content

Turn Management

sagatake edited this page Apr 9, 2025 · 5 revisions

This module controls turn taking behaviors based on audio-visual signals from both of the user and the agent. They are implemented with voice activity detection (VAD) model and voice activity projection (VAP) model.

There are two sub-modules are implemented in this module:

Requirement

  • We strongly recommend you to have nvidia GPU. You might be able to run the model without it but we assume it is extremely slow.
  • You need to install/setup LLM and ASR modules (e.g. API keys etc.).

Common installation

  • Install conda or anaconda from https://www.anaconda.com/
  • Install python3 (usually installed with anaconda but not for some reasons [e.g. Path to "python.exe" is not set globally])
  • You can test it by loading Greta - Microphone - backchannel.xml from Modular.jar. If it is correctly installed, Greta will do some nodding to your utterance.

Default server setup

  • Microphone stream server: TCP at port 9000 of localhost
  • Feedback server from Greta: TCP at port 5960
  • Main management server from Greta: TCP at port 5961
  • You can modify the microphone port number by replacing with your favorite number at the Microphone module in Modular.jar and press update button
  • When you modify port number of Microphone streaming server in Modular.jar, you also need to update it in this module as well

Getting started with Greta

Basics

Advanced

For developpers

Functionalities

Core functionality

Auxiliary functionality

Preview functionality (only in dev branch)

Nothing to show :)

Previous functionality (it might work, but not supported anymore)

Tips

Clone this wiki locally