Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tts.say() doesn't execute #867

Open
agilebean opened this issue Dec 16, 2024 · 0 comments
Open

tts.say() doesn't execute #867

agilebean opened this issue Dec 16, 2024 · 0 comments

Comments

@agilebean
Copy link

agilebean commented Dec 16, 2024

Description

Bug

Environment

  • pipecat-ai version: 0.0.47
  • python version: 3.12
  • OS: MacOS 14.2

Issue description

The tts.say() doesn't work, i.e. doesn't push the audio.

Repro steps

tts = PlayHTTTSService(
        user_id=os.getenv("PLAYHT_USER_ID"),
        api_key=os.getenv("PLAYHT_API_KEY"),
        voice_url=voice_url
    )
tts.say("This is a text message to test the TTS say method.")

Potential root cause

the say method flushes the audio as last statement:

async def say(self, text: str):
    aggregate_sentences = self._aggregate_sentences
    self._aggregate_sentences = False
    await self.process_frame(TextFrame(text=text), FrameDirection.DOWNSTREAM)
    self._aggregate_sentences = aggregate_sentences
    await self.flush_audio()

However, this is an abstract method defined in TTSService:

@abstractmethod
async def flush_audio(self):
    pass

In several TTS classes, this method is not implemented, e.g. PlayHTTTSService

Therefore, the await self.flush_audio() call doesn't perform any operation. Consequently, the process_frame() might not end because it doesn't process the TTSStoppedFrame here:

async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]:
    try:
        yield TTSStartedFrame()
        # ... process audio chunks ...
        yield TTSAudioRawFrame(chunk, self._settings["sample_rate"], 1)
        # ... more processing ...
        yield TTSStoppedFrame()  # This frame might never be processed

The above is just a speculation because I lack the understanding of the underlying mechanisms of processing frames.

Solution

  1. Is there a generic solution for flush_audio which works for all TTS services?
  2. If not, what would be a workaround?
    A workaround would be very helpful also to use with other methods like push_frame(TTSSpeakFrame()) or task.queue_frames to ensure completion, as currently, they all seem to queue the frames.
@agilebean agilebean changed the title tts.say() tts.say() doesn't execute Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant