Skip to content

[Tracking] PT model support follow up #217

Open
@masahi

Description

@masahi

#207 is only the first cut. Many TODO items are left

  • Fix memory profiling Enable running PyTorch models  #207 (comment)
  • Make single-gpu performance at parity with the MLC model
  • Make multi-gpu performance sane
  • Consider using cuda graph if we decide to keep the 2D padded input representation
  • Or, consider reverting the 2D input change
  • Revisit custom changes to our vllm fork https://github.com/octoml/vllm/tree/for-mlc-serve and minimize them
  • Figure out how to support other models besides the ones in vllm
  • Support parallel-sampling eviction by recompute (requires model change)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions