Transcription API #57194

Blaze-DSP · 2025-10-04T20:49:46Z

Why are these changes needed?

Expose an transcriptions API like [https://platform.openai.com/docs/api-reference/audio] using vLLM

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run pre-commit jobs to lint the changes in this PR. (pre-commit setup)
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

gemini-code-assist

Code Review

This pull request introduces a new transcription API, following the OpenAI specification. The changes are well-structured, touching the necessary model definitions, LLM server, vLLM engine, and router components. The implementation largely follows existing patterns in the codebase. However, I've identified a couple of critical issues that would cause runtime errors, such as a missing comma in a type hint and a method name mismatch between the server and the engine. There are also some minor maintainability issues like a copy-pasted comment and a typo in a docstring. Addressing these points will make the PR ready for merging.

python/ray/llm/_internal/serve/deployments/llm/llm_server.py

python/ray/llm/_internal/serve/deployments/llm/vllm/vllm_engine.py

python/ray/llm/_internal/serve/deployments/llm/llm_server.py

python/ray/llm/_internal/serve/deployments/routers/router.py

Copilot

Pull Request Overview

This PR introduces support for a transcription API to vLLM's OpenAI-compatible interface, following the OpenAI audio/transcriptions API specification. The implementation adds the necessary request/response models, router endpoints, and engine integration to handle audio transcription requests.

Adds TranscriptionRequest, TranscriptionResponse, and TranscriptionStreamResponse models
Implements /v1/audio/transcriptions endpoint in the router
Integrates transcription support into the vLLM engine with proper error handling

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
python/ray/serve/llm/openai_api_models.py	Adds public API models for transcription request/response types
python/ray/llm/_internal/serve/deployments/routers/router.py	Implements transcription endpoint and updates request processing logic
python/ray/llm/_internal/serve/deployments/llm/vllm/vllm_engine.py	Adds transcription engine integration with vLLM OpenAI serving
python/ray/llm/_internal/serve/deployments/llm/llm_server.py	Adds transcription method to LLM server with async generator interface
python/ray/llm/_internal/serve/configs/openai_api_models.py	Defines internal transcription models and response type unions

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

python/ray/llm/_internal/serve/deployments/routers/router.py

Copilot · 2025-10-04T21:46:35Z

python/ray/llm/_internal/serve/deployments/llm/vllm/vllm_engine.py

+            async for response in transcription_response:
+                if not isinstance(response, str):
+                    raise ValueError(
+                        f"Expected create_transcription to return a stream of strings, got and item with type {type(response)}"


Corrected 'and item' to 'an item' in the error message.

Suggested change

f"Expected create_transcription to return a stream of strings, got and item with type {type(response)}"

f"Expected create_transcription to return a stream of strings, got an item with type {type(response)}"

python/ray/llm/_internal/serve/deployments/llm/llm_server.py

Copilot · 2025-10-04T21:46:35Z

python/ray/llm/_internal/serve/configs/openai_api_models.py


+LLMTranscriptionResponse = Union[
+    AsyncGenerator[
+        Union[TranscriptionStreamResponse, TranscriptionResponse, ErrorResponse], None


The LLMTranscriptionResponse type definition is missing the string type in the Union. Based on the vLLM engine implementation, transcription streaming can yield strings, so this should include 'str' in the Union type like the other response types.

Suggested change

Union[TranscriptionStreamResponse, TranscriptionResponse, ErrorResponse], None

Union[str, TranscriptionStreamResponse, TranscriptionResponse, ErrorResponse], None

@Blaze-DSP bumping this

kouroshHakha

Nice. I think the basic feature looks good. We need to just add CI tests and some release tests as well.

For CI please take a look at existing tests for the endpoints at engine and router levels. Here are some I found:

https://github.com/ray-project/ray/blob/master/python/ray/llm/tests/serve/cpu/deployments/llm/test_llm_engine.py: includes tests for engine interfaces
https://github.com/ray-project/ray/blob/master/python/ray/llm/tests/serve/cpu/deployments/llm/test_llm_server.py: includes tests for LLMServer interfaces.

You would need to create a mock engine with some reasonable transcription behavior.

Let's keep the translation for another PR after we cover everything for this new endpoint.

For release test, could you share the serve run script that you used to validate the behavior along with the client code and expected output. We can turn that into a gpu release test with a real model (maybe using whisper-tiny, etc) so that it is continuously tested.

Blaze-DSP · 2025-10-07T20:36:14Z

@kouroshHakha CI tests have been written and docs have also been updated. Pls check and verify.

If we are going to adopt vllm==v0.11.0, then v0 has been entirely been depricated and all models are supported via the v1 engine. Need to make appropriate changes for docs, etc(for e.g., embeddings).

kouroshHakha

Adding @eicherseiji for review.

doc/source/serve/llm/quick-start.rst

python/ray/llm/_internal/serve/configs/openai_api_models.py

eicherseiji

Recommend pip install pre-commit && pre-commit install before a lint commit to satisfy the CI.

For a release test, recommend

ray/release/llm_tests/serve/test_llm_serve_correctness.py

Line 187 in 067c02a

def test_llm_serve_correctness(

and

ray/release/release_tests.yaml

Line 3781 in 067c02a

- name: llm_serve_correctness

for examples.

Looks like we're in pretty good shape though. Just a few comments + release test and we should be good. Thanks!

doc/source/serve/llm/quick-start.rst

eicherseiji · 2025-10-09T02:55:59Z

doc/source/serve/llm/quick-start.rst

+
+        .. code-block:: python
+
+            from ray import serve


Let's make sure this example is tested by hand before resolving this comment and merging

Ideally we would add a docs test for it if you're up for it @Blaze-DSP :)

yeah sure. Can you provide any references for this on how/where to write tests?. Also, for transcription, do I write a separate release test?

Here's an example that converts a docs code snippet to CI test: #54763

And actually for now, you can just add to the existing Serve LLM integration release test file and save some CI cost by not launching a separate job.

Just add to this file: https://github.com/ray-project/ray/blob/master/release/llm_tests/serve/test_llm_serve_integration.py

@eicherseiji Updated docs for ci tests and added release test

python/ray/llm/_internal/serve/deployments/routers/router.py

eicherseiji · 2025-10-10T20:34:26Z

Let me know when changes have settled and I'll kick off the release test as well.

Blaze-DSP · 2025-10-10T20:43:16Z

Have made the changes. Pls take a look.

eicherseiji

Nice. @Blaze-DSP I am kicking off release tests now (edit: in progress here https://buildkite.com/ray-project/release/builds/62950).

Last things:

Resolve remaining Cursor comments
Add the config yaml to the bazel BUILD file as a data dependency to pass CI. You can see an example here:

ray/doc/BUILD.bazel

Line 351 in c5e6647

data = ["source/llm/doc_code/serve/qwen/llm_config_example.yaml"],
Resolve lint errors via ./ci/env/lint.sh pre_commit or by using pre-commit hooks. Let me know if you have specific questions on this via Slack.

Thanks!

Signed-off-by: DPatel_7 <[email protected]>

## Why are these changes needed? Add `_unresolved_paths` for file based datasources for lineage tracking capabilities. ## Related issue number  ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run pre-commit jobs to lint the changes in this PR. ([pre-commit setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Goutam <[email protected]> Signed-off-by: DPatel_7 <[email protected]>

Signed-off-by: DPatel_7 <[email protected]>

cursor · 2025-10-13T08:36:40Z

python/ray/llm/_internal/serve/configs/openai_api_models.py

    AsyncGenerator[
-        Union[CompletionStreamResponse, CompletionResponse, ErrorResponse], None
+        Union[str, CompletionStreamResponse, CompletionResponse, ErrorResponse], None
+    ],


Bug: Async Generators Yield Incorrect Types

The type hints for LLMChatResponse and LLMCompletionsResponse are inaccurate. While ChatCompletionStreamResponse and CompletionStreamResponse are used elsewhere, the async generators for these responses actually yield str for streaming and the non-streaming Response object, not the StreamResponse objects directly.

Blaze-DSP requested review from a team as code owners October 4, 2025 20:49

Blaze-DSP force-pushed the master branch from e99d085 to a1ff047 Compare October 4, 2025 20:50

Blaze-DSP requested a review from a team as a code owner October 4, 2025 20:50

gemini-code-assist bot reviewed Oct 4, 2025

View reviewed changes

This comment was marked as outdated.

Sign in to view

Blaze-DSP force-pushed the master branch from b7b0ac2 to 07b25e1 Compare October 4, 2025 21:04

This comment was marked as outdated.

Sign in to view

eicherseiji requested review from Copilot and eicherseiji October 4, 2025 21:45

Copilot AI reviewed Oct 4, 2025

View reviewed changes

kouroshHakha reviewed Oct 4, 2025

View reviewed changes

ray-gardener bot added serve Ray Serve Related Issue docs An issue or change related to documentation llm community-contribution Contributed by the community labels Oct 5, 2025

Blaze-DSP requested a review from a team as a code owner October 7, 2025 20:32

This comment was marked as outdated.

Sign in to view

kouroshHakha reviewed Oct 8, 2025

View reviewed changes

doc/source/serve/llm/quick-start.rst Outdated Show resolved Hide resolved

doc/source/serve/llm/quick-start.rst Outdated Show resolved Hide resolved

python/ray/llm/_internal/serve/configs/openai_api_models.py Show resolved Hide resolved

kouroshHakha assigned eicherseiji Oct 8, 2025

eicherseiji reviewed Oct 9, 2025

View reviewed changes

Blaze-DSP force-pushed the master branch from eb9ae50 to 9367711 Compare October 9, 2025 16:55

This comment was marked as outdated.

Sign in to view

Blaze-DSP force-pushed the master branch from 1b3cd30 to 240a8b6 Compare October 9, 2025 17:40

This comment was marked as outdated.

Sign in to view

eicherseiji reviewed Oct 11, 2025

View reviewed changes

This comment was marked as outdated.

Sign in to view

DPatel_7 and others added 15 commits October 12, 2025 00:01

initial commit for transcriptions api integration

5e7bd39

Signed-off-by: DPatel_7 <[email protected]>

naming fixes

e8f3c10

Signed-off-by: DPatel_7 <[email protected]>

ci tests for transcriptions api and docs for transcription

b390977

Signed-off-by: DPatel_7 <[email protected]>

type error fix

78fec58

Signed-off-by: DPatel_7 <[email protected]>

formatting updated and added engine transcription function def

960afff

Signed-off-by: DPatel_7 <[email protected]>

naming updates

ae3680a

Signed-off-by: DPatel_7 <[email protected]>

lora prefix updates and code formatting

510f19a

Signed-off-by: DPatel_7 <[email protected]>

request_id added in transcription request

d2edc28

Signed-off-by: DPatel_7 <[email protected]>

modified docs for ci tests and added release test

6e67198

Signed-off-by: DPatel_7 <[email protected]>

enum fix

275b7d3

Signed-off-by: DPatel_7 <[email protected]>

enum fix

0f49b77

Signed-off-by: DPatel_7 <[email protected]>

router updates

07828d0

Signed-off-by: DPatel_7 <[email protected]>

router fix

dc2b62b

Signed-off-by: DPatel_7 <[email protected]>

pre commit hooks run and bazel build

e62ee82

Signed-off-by: DPatel_7 <[email protected]>

Blaze-DSP force-pushed the master branch from 73ccc04 to e62ee82 Compare October 11, 2025 18:31

Merge branch 'ray-project:master' into master

a1f375f

This comment was marked as outdated.

Sign in to view

DPatel_7 and others added 2 commits October 12, 2025 00:21

enum fixes

d356267

Signed-off-by: DPatel_7 <[email protected]>

Merge branch 'master' into master

30d7d64

This comment was marked as outdated.

Sign in to view

inconsistency fixes

ff4db2b

Signed-off-by: DPatel_7 <[email protected]>

This comment was marked as outdated.

Sign in to view

DPatel_7 and others added 2 commits October 12, 2025 16:55

updates

a9a3f8e

Signed-off-by: DPatel_7 <[email protected]>

Merge branch 'master' into master

83b6686

cursor bot reviewed Oct 13, 2025

View reviewed changes

	f"Expected create_transcription to return a stream of strings, got and item with type {type(response)}"
	f"Expected create_transcription to return a stream of strings, got an item with type {type(response)}"

	Union[TranscriptionStreamResponse, TranscriptionResponse, ErrorResponse], None
	Union[str, TranscriptionStreamResponse, TranscriptionResponse, ErrorResponse], None

Transcription API #57194

Are you sure you want to change the base?

Transcription API #57194

Conversation

Blaze-DSP commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Checks

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

eicherseiji Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

kouroshHakha left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

Blaze-DSP commented Oct 7, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

kouroshHakha left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eicherseiji left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eicherseiji Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

eicherseiji Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

Blaze-DSP Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

eicherseiji Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

eicherseiji Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

Blaze-DSP Oct 9, 2025

Choose a reason for hiding this comment

Blaze-DSP commented Oct 4, 2025 •

edited

Loading

eicherseiji left a comment •

edited

Loading