New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[WIP] [V1] TPU support #11936

Open

alexm-neuralmagic wants to merge 18 commits into main from tpu_v1

+2,108 −9

Collaborator

alexm-neuralmagic commented Jan 10, 2025 •

edited by github-actions bot

Loading

This PR is a rebase and modification of @robertgshaw2-neuralmagic original PR for TPU support from 1.5 months ago #10241

TODOs:

Verify correctness
Refactor the code to remove duplications
Tweak attention parameters for best performance.

Alexander Matveev added 8 commits

January 9, 2025 17:00


          TPU rebase from Rob's PR - in process

c63fc49


          finished tpu model runner

ae3c487


          add tpu worker

35d139d


          add files

2656fb2


          add executor

6a7633a


          store tmp

56621b4


          finished rebase

a9fc408


          remove tmp files

fda64cb

alexm-neuralmagic requested review from WoosukKwon, robertgshaw2-neuralmagic, njhill, ywang96 and comaniac as code owners

January 10, 2025 15:59

alexm-neuralmagic self-assigned this

github-actions bot commented Jan 10, 2025

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

mergify bot commented Jan 10, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @alexm-neuralmagic.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify bot added the needs-rebase label

Alexander Matveev added 2 commits

January 10, 2025 16:10


          fix refs

774a112


          add files

d534ecf

alexm-neuralmagic requested a review from mgoin

January 10, 2025 16:18


          add test

e9057a7

alexm-neuralmagic requested review from DarkLight1337 and simon-mo as code owners

January 10, 2025 16:51

Alexander Matveev added 4 commits

January 10, 2025 18:04


          tmp not working yet

d40ef18


          made progress

422aecc


          more progress

6065fac


          runs, no correctness yet

f1da4b0

liangfu reviewed

View reviewed changes

vllm/v1/worker/tpu_model_runner.py

Comment on lines +81 to +85

    
                  def __init__(

                      self,

                      vllm_config: VllmConfig,

                      device: torch.device,

                  ):

Contributor

liangfu Jan 10, 2025

the function implementation is almost identical to gpu_model_runner.py, it would be better if we build a ModelRunnerBase class and derive from the base class instead of duplicating.

liangfu reviewed

View reviewed changes

vllm/v1/worker/tpu_model_runner.py

Comment on lines +382 to +388

    
                      return PrefillInputData(

                          request_ids=prefill_request_ids,

                          prompt_lens=prefill_prompt_lens,

                          token_ids=prefill_token_ids,

                          position_ids=prefill_position_ids,

                          attn_metadata=prefill_attn_metadata,

                      )

Contributor

liangfu Jan 10, 2025

remove the PrefillInputData data structure, and make it consistent with gpu_model_runner ?

vllm/v1/worker/tpu_model_runner.py

Comment on lines +318 to +321

    
                  def _prepare_prefill_inputs(

                      self,

                      num_scheduled_tokens: List[int],

                  ) -> PrefillInputData:

Contributor

liangfu Jan 10, 2025

do we still need _prepare_prefill_inputs in V1 ?
(i'm assuming this can be handled with _prepare_inputs already.)

Member

mgoin Jan 10, 2025

We need to run separate prefill and decode for TPU since we don't have the attention kernel support yet. This is on the way so we hope to remove soon

vllm/v1/worker/tpu_model_runner.py

    
                          attn_metadata=prefill_attn_metadata,

                      )

                  def _prepare_decode_inputs(self) -> DecodeInputData:

Contributor

liangfu Jan 10, 2025

again, do we really need _prepare_decode_inputs in V1 architecture?
(i'm assuming this can be handled with _prepare_inputs already.)

vllm/v1/worker/tpu_model_runner.py

    
                                                 effective_query_lens=None,

                                             ))

                  def _prepare_inputs(self, scheduler_output: "SchedulerOutput"):

Contributor

liangfu Jan 10, 2025

this is almost identical with current gpu_model_runner implementation, consider reusing instead of duplicating ?


          fixes

6ea94b0

Alexander Matveev added 2 commits

January 11, 2025 03:14


          tmp wip

cefce4a


          works!

fca7765

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

liangfu liangfu left review comments

WoosukKwon Awaiting requested review from WoosukKwon WoosukKwon is a code owner

robertgshaw2-neuralmagic Awaiting requested review from robertgshaw2-neuralmagic robertgshaw2-neuralmagic is a code owner

njhill Awaiting requested review from njhill njhill is a code owner

ywang96 Awaiting requested review from ywang96 ywang96 is a code owner

comaniac Awaiting requested review from comaniac comaniac is a code owner

mgoin Awaiting requested review from mgoin

DarkLight1337 Awaiting requested review from DarkLight1337 DarkLight1337 is a code owner

simon-mo Awaiting requested review from simon-mo simon-mo is a code owner

At least 1 approving review is required to merge this pull request.

Labels