Skip to content

Conversation

Blaze-DSP
Copy link
Contributor

@Blaze-DSP Blaze-DSP commented Oct 4, 2025

Why are these changes needed?

Expose an transcriptions API like [https://platform.openai.com/docs/api-reference/audio] using vLLM

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run pre-commit jobs to lint the changes in this PR. (pre-commit setup)
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@Blaze-DSP Blaze-DSP requested review from a team as code owners October 4, 2025 20:49
@Blaze-DSP Blaze-DSP requested a review from a team as a code owner October 4, 2025 20:50
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new transcription API, following the OpenAI specification. The changes are well-structured, touching the necessary model definitions, LLM server, vLLM engine, and router components. The implementation largely follows existing patterns in the codebase. However, I've identified a couple of critical issues that would cause runtime errors, such as a missing comma in a type hint and a method name mismatch between the server and the engine. There are also some minor maintainability issues like a copy-pasted comment and a typo in a docstring. Addressing these points will make the PR ready for merging.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces support for a transcription API to vLLM's OpenAI-compatible interface, following the OpenAI audio/transcriptions API specification. The implementation adds the necessary request/response models, router endpoints, and engine integration to handle audio transcription requests.

  • Adds TranscriptionRequest, TranscriptionResponse, and TranscriptionStreamResponse models
  • Implements /v1/audio/transcriptions endpoint in the router
  • Integrates transcription support into the vLLM engine with proper error handling

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
python/ray/serve/llm/openai_api_models.py Adds public API models for transcription request/response types
python/ray/llm/_internal/serve/deployments/routers/router.py Implements transcription endpoint and updates request processing logic
python/ray/llm/_internal/serve/deployments/llm/vllm/vllm_engine.py Adds transcription engine integration with vLLM OpenAI serving
python/ray/llm/_internal/serve/deployments/llm/llm_server.py Adds transcription method to LLM server with async generator interface
python/ray/llm/_internal/serve/configs/openai_api_models.py Defines internal transcription models and response type unions

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

async for response in transcription_response:
if not isinstance(response, str):
raise ValueError(
f"Expected create_transcription to return a stream of strings, got and item with type {type(response)}"
Copy link

Copilot AI Oct 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected 'and item' to 'an item' in the error message.

Suggested change
f"Expected create_transcription to return a stream of strings, got and item with type {type(response)}"
f"Expected create_transcription to return a stream of strings, got an item with type {type(response)}"

Copilot uses AI. Check for mistakes.


LLMTranscriptionResponse = Union[
AsyncGenerator[
Union[TranscriptionStreamResponse, TranscriptionResponse, ErrorResponse], None
Copy link

Copilot AI Oct 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LLMTranscriptionResponse type definition is missing the string type in the Union. Based on the vLLM engine implementation, transcription streaming can yield strings, so this should include 'str' in the Union type like the other response types.

Suggested change
Union[TranscriptionStreamResponse, TranscriptionResponse, ErrorResponse], None
Union[str, TranscriptionStreamResponse, TranscriptionResponse, ErrorResponse], None

Copilot uses AI. Check for mistakes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Blaze-DSP bumping this

Copy link
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I think the basic feature looks good. We need to just add CI tests and some release tests as well.

For CI please take a look at existing tests for the endpoints at engine and router levels. Here are some I found:

You would need to create a mock engine with some reasonable transcription behavior.

Let's keep the translation for another PR after we cover everything for this new endpoint.

For release test, could you share the serve run script that you used to validate the behavior along with the client code and expected output. We can turn that into a gpu release test with a real model (maybe using whisper-tiny, etc) so that it is continuously tested.

@ray-gardener ray-gardener bot added serve Ray Serve Related Issue docs An issue or change related to documentation llm community-contribution Contributed by the community labels Oct 5, 2025
@Blaze-DSP Blaze-DSP requested a review from a team as a code owner October 7, 2025 20:32
cursor[bot]

This comment was marked as outdated.

@Blaze-DSP
Copy link
Contributor Author

@kouroshHakha CI tests have been written and docs have also been updated. Pls check and verify.

If we are going to adopt vllm==v0.11.0, then v0 has been entirely been depricated and all models are supported via the v1 engine. Need to make appropriate changes for docs, etc(for e.g., embeddings).

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

Copy link
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding @eicherseiji for review.

Copy link
Contributor

@eicherseiji eicherseiji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend pip install pre-commit && pre-commit install before a lint commit to satisfy the CI.

For a release test, recommend

def test_llm_serve_correctness(
and
- name: llm_serve_correctness
for examples.

Looks like we're in pretty good shape though. Just a few comments + release test and we should be good. Thanks!


.. code-block:: python
from ray import serve
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure this example is tested by hand before resolving this comment and merging

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we would add a docs test for it if you're up for it @Blaze-DSP :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah sure. Can you provide any references for this on how/where to write tests?. Also, for transcription, do I write a separate release test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an example that converts a docs code snippet to CI test: #54763

And actually for now, you can just add to the existing Serve LLM integration release test file and save some CI cost by not launching a separate job.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eicherseiji Updated docs for ci tests and added release test

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

@eicherseiji
Copy link
Contributor

Let me know when changes have settled and I'll kick off the release test as well.

@Blaze-DSP
Copy link
Contributor Author

Have made the changes. Pls take a look.

Copy link
Contributor

@eicherseiji eicherseiji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. @Blaze-DSP I am kicking off release tests now (edit: in progress here https://buildkite.com/ray-project/release/builds/62950).

Last things:

  1. Resolve remaining Cursor comments
  2. Add the config yaml to the bazel BUILD file as a data dependency to pass CI. You can see an example here:
    data = ["source/llm/doc_code/serve/qwen/llm_config_example.yaml"],
  3. Resolve lint errors via ./ci/env/lint.sh pre_commit or by using pre-commit hooks. Let me know if you have specific questions on this via Slack.

Thanks!

cursor[bot]

This comment was marked as outdated.

DPatel_7 and others added 15 commits October 12, 2025 00:01
Signed-off-by: DPatel_7 <[email protected]>
Signed-off-by: DPatel_7 <[email protected]>
Signed-off-by: DPatel_7 <[email protected]>
Signed-off-by: DPatel_7 <[email protected]>
Signed-off-by: DPatel_7 <[email protected]>
Signed-off-by: DPatel_7 <[email protected]>
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

Add `_unresolved_paths` for file based datasources for lineage tracking
capabilities.

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run pre-commit jobs to lint the changes in this PR.
([pre-commit
setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting))
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Goutam <[email protected]>
Signed-off-by: DPatel_7 <[email protected]>
Signed-off-by: DPatel_7 <[email protected]>
cursor[bot]

This comment was marked as outdated.

DPatel_7 and others added 2 commits October 12, 2025 00:21
cursor[bot]

This comment was marked as outdated.

Signed-off-by: DPatel_7 <[email protected]>
cursor[bot]

This comment was marked as outdated.

DPatel_7 and others added 2 commits October 12, 2025 16:55
AsyncGenerator[
Union[CompletionStreamResponse, CompletionResponse, ErrorResponse], None
Union[str, CompletionStreamResponse, CompletionResponse, ErrorResponse], None
],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Async Generators Yield Incorrect Types

The type hints for LLMChatResponse and LLMCompletionsResponse are inaccurate. While ChatCompletionStreamResponse and CompletionStreamResponse are used elsewhere, the async generators for these responses actually yield str for streaming and the non-streaming Response object, not the StreamResponse objects directly.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community docs An issue or change related to documentation go add ONLY when ready to merge, run all tests llm serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants