-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Transcription API #57194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Transcription API #57194
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new transcription API, following the OpenAI specification. The changes are well-structured, touching the necessary model definitions, LLM server, vLLM engine, and router components. The implementation largely follows existing patterns in the codebase. However, I've identified a couple of critical issues that would cause runtime errors, such as a missing comma in a type hint and a method name mismatch between the server and the engine. There are also some minor maintainability issues like a copy-pasted comment and a typo in a docstring. Addressing these points will make the PR ready for merging.
python/ray/llm/_internal/serve/deployments/llm/vllm/vllm_engine.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces support for a transcription API to vLLM's OpenAI-compatible interface, following the OpenAI audio/transcriptions API specification. The implementation adds the necessary request/response models, router endpoints, and engine integration to handle audio transcription requests.
- Adds TranscriptionRequest, TranscriptionResponse, and TranscriptionStreamResponse models
- Implements
/v1/audio/transcriptions
endpoint in the router - Integrates transcription support into the vLLM engine with proper error handling
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
File | Description |
---|---|
python/ray/serve/llm/openai_api_models.py | Adds public API models for transcription request/response types |
python/ray/llm/_internal/serve/deployments/routers/router.py | Implements transcription endpoint and updates request processing logic |
python/ray/llm/_internal/serve/deployments/llm/vllm/vllm_engine.py | Adds transcription engine integration with vLLM OpenAI serving |
python/ray/llm/_internal/serve/deployments/llm/llm_server.py | Adds transcription method to LLM server with async generator interface |
python/ray/llm/_internal/serve/configs/openai_api_models.py | Defines internal transcription models and response type unions |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
async for response in transcription_response: | ||
if not isinstance(response, str): | ||
raise ValueError( | ||
f"Expected create_transcription to return a stream of strings, got and item with type {type(response)}" |
Copilot
AI
Oct 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected 'and item' to 'an item' in the error message.
f"Expected create_transcription to return a stream of strings, got and item with type {type(response)}" | |
f"Expected create_transcription to return a stream of strings, got an item with type {type(response)}" |
Copilot uses AI. Check for mistakes.
|
||
LLMTranscriptionResponse = Union[ | ||
AsyncGenerator[ | ||
Union[TranscriptionStreamResponse, TranscriptionResponse, ErrorResponse], None |
Copilot
AI
Oct 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LLMTranscriptionResponse type definition is missing the string type in the Union. Based on the vLLM engine implementation, transcription streaming can yield strings, so this should include 'str' in the Union type like the other response types.
Union[TranscriptionStreamResponse, TranscriptionResponse, ErrorResponse], None | |
Union[str, TranscriptionStreamResponse, TranscriptionResponse, ErrorResponse], None |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Blaze-DSP bumping this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. I think the basic feature looks good. We need to just add CI tests and some release tests as well.
For CI please take a look at existing tests for the endpoints at engine and router levels. Here are some I found:
- https://github.com/ray-project/ray/blob/master/python/ray/llm/tests/serve/cpu/deployments/llm/test_llm_engine.py: includes tests for engine interfaces
- https://github.com/ray-project/ray/blob/master/python/ray/llm/tests/serve/cpu/deployments/llm/test_llm_server.py: includes tests for LLMServer interfaces.
You would need to create a mock engine with some reasonable transcription behavior.
Let's keep the translation for another PR after we cover everything for this new endpoint.
For release test, could you share the serve run script that you used to validate the behavior along with the client code and expected output. We can turn that into a gpu release test with a real model (maybe using whisper-tiny, etc) so that it is continuously tested.
@kouroshHakha CI tests have been written and docs have also been updated. Pls check and verify. If we are going to adopt vllm==v0.11.0, then v0 has been entirely been depricated and all models are supported via the v1 engine. Need to make appropriate changes for docs, etc(for e.g., embeddings). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding @eicherseiji for review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend pip install pre-commit && pre-commit install
before a lint commit to satisfy the CI.
For a release test, recommend
def test_llm_serve_correctness( |
ray/release/release_tests.yaml
Line 3781 in 067c02a
- name: llm_serve_correctness |
Looks like we're in pretty good shape though. Just a few comments + release test and we should be good. Thanks!
doc/source/serve/llm/quick-start.rst
Outdated
|
||
.. code-block:: python | ||
from ray import serve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure this example is tested by hand before resolving this comment and merging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we would add a docs test for it if you're up for it @Blaze-DSP :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah sure. Can you provide any references for this on how/where to write tests?. Also, for transcription, do I write a separate release test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an example that converts a docs code snippet to CI test: #54763
And actually for now, you can just add to the existing Serve LLM integration release test file and save some CI cost by not launching a separate job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eicherseiji Updated docs for ci tests and added release test
Let me know when changes have settled and I'll kick off the release test as well. |
Have made the changes. Pls take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. @Blaze-DSP I am kicking off release tests now (edit: in progress here https://buildkite.com/ray-project/release/builds/62950).
Last things:
- Resolve remaining Cursor comments
- Add the config yaml to the bazel BUILD file as a data dependency to pass CI. You can see an example here:
Line 351 in c5e6647
data = ["source/llm/doc_code/serve/qwen/llm_config_example.yaml"], - Resolve lint errors via ./ci/env/lint.sh pre_commit or by using pre-commit hooks. Let me know if you have specific questions on this via Slack.
Thanks!
Signed-off-by: DPatel_7 <[email protected]>
Signed-off-by: DPatel_7 <[email protected]>
Signed-off-by: DPatel_7 <[email protected]>
Signed-off-by: DPatel_7 <[email protected]>
Signed-off-by: DPatel_7 <[email protected]>
Signed-off-by: DPatel_7 <[email protected]>
Signed-off-by: DPatel_7 <[email protected]>
Signed-off-by: DPatel_7 <[email protected]>
Signed-off-by: DPatel_7 <[email protected]>
Signed-off-by: DPatel_7 <[email protected]>
Signed-off-by: DPatel_7 <[email protected]>
Signed-off-by: DPatel_7 <[email protected]>
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? Add `_unresolved_paths` for file based datasources for lineage tracking capabilities. ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run pre-commit jobs to lint the changes in this PR. ([pre-commit setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Goutam <[email protected]> Signed-off-by: DPatel_7 <[email protected]>
Signed-off-by: DPatel_7 <[email protected]>
Signed-off-by: DPatel_7 <[email protected]>
Signed-off-by: DPatel_7 <[email protected]>
Signed-off-by: DPatel_7 <[email protected]>
Signed-off-by: DPatel_7 <[email protected]>
AsyncGenerator[ | ||
Union[CompletionStreamResponse, CompletionResponse, ErrorResponse], None | ||
Union[str, CompletionStreamResponse, CompletionResponse, ErrorResponse], None | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Async Generators Yield Incorrect Types
The type hints for LLMChatResponse
and LLMCompletionsResponse
are inaccurate. While ChatCompletionStreamResponse
and CompletionStreamResponse
are used elsewhere, the async generators for these responses actually yield str
for streaming and the non-streaming Response
object, not the StreamResponse
objects directly.
Why are these changes needed?
Expose an transcriptions API like [https://platform.openai.com/docs/api-reference/audio] using vLLM
Checks
git commit -s
) in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.