-
Notifications
You must be signed in to change notification settings - Fork 209
feat(server): add OpenAI-compatible endpoint #421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
-Add openai-compatible v1/audio/speech endpoint to server.py -Add lowvram command line argument to server.py (if running in cuda, switches model to cpu when idle) -Allow openai server to use built in speaker voices, cloning wavs or directory with wavs as the voice -Allow language_id as server.py command line argument in alternative to request parameter -split long input text using built in segmenter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thank you! Can you share a link to the relevant OpenAI API spec so I can check everything works as expected? Could you also add some information and examples for this to https://github.com/idiap/coqui-ai-TTS/blob/dev/docs/source/server.md?
Yes here you go, I'm going to try to do extensive testing, though can confirm the code already works with a project that uses the openai TTS spec extensively (WingmanAI). I think the most testing is needed for models other than xtts2 as I have no idea how to use those other models coqui-tts supports. https://platform.openai.com/docs/guides/text-to-speech Also: |
-Eliminate low VRAM mode (results in approx. 1.5gb more VRAM use and creeps up over time versus lowVRAM mode variant) -eliminate models and voices endpoints since not a current part of OpenAI spec -eliminate split sentences code, relying on coqui tts' already utilized split_sentences functionality with api
Since not using lowvram mode, gc import now unnecessary
-add usage examples to server.md -fix bug with elif statement in last changes -default to using speaker_idx if any specified at server launch and no voice parameter is passed to openai server
OK I think this is ready for your testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you again! I just fixed up a few small things and checked that it works as expected. Feel free to open any follow-up PRs/issues if you spot anything else!
Thank you for merging! I'll try to turn to that underlying synthesizer issue to make sure proper segmenter is always used per language in the coming days. |
-Add openai-compatible v1/audio/speech endpoint to server.py
-Add lowvram command line argument to server.py (if running in cuda, switches model to cpu when idle)
-Allow openai server to use built in speaker voices, cloning wavs or directory with wavs as the voice
-Allow language_id as server.py command line argument in alternative to request parameter
-split long input text using built in segmenter (eg to get around hard input limits in xtts2)
-tested on windows only (I do not have mac or linux)
-tested with xtts2 but changes in theory should not impact other models