Skip to content

Milestones

List view

  • Due by August 14, 2025
    2/2 issues closed
  • Due by October 3, 2025
    28/28 issues closed
  • Due by June 30, 2025
    46/46 issues closed
  • - Model Centric API - Offline Inference Performance Improvement - Prefix cache + Heterogenous Routing + Fairness request routing

    Due by April 30, 2025
    50/50 issues closed
  • Due by October 31, 2024
    13/13 issues closed
  • Due by October 19, 2024
    8/8 issues closed
  • 1. Stability Improvement with all bug fixes. 2. Model centric deployment

    Due by October 4, 2024
    31/31 issues closed
  • Focus more on the advanced features - Distributed and Disaggregated Inference - Distributed KV Cache - Cost-efficient Heterogenous placement and routing - Cost-efficient Heterogeneous Serving - v0.1.0 feature quality improvement

    Due by January 31, 2025
    98/99 issues closed
  • 1. Heterogeneous hardwares support in request routing 2. Advanced autoscaling algorithms support (KPA and our own solutions) 3. Lora feature improvements like artifact registry, unload etc 4. Create CLI to simplify lower the entry barrier for users 5. Support multi-host inference through RayClusterGroup. 6. Support GPU streaming acceleration

    Due by September 24, 2024
    22/22 issues closed
  • Due by September 8, 2024
    24/24 issues closed
  • All stories and RFCs proposed in v0.1.0 could be fully delivered

    Due by October 31, 2024
    12/12 issues closed
  • 1. Support Lora Model Adapter Workflow 2. Support HPA through customized Pod Autoscaler 3. Support Envoy-Gateway based service routing, model selection and meta information registration 4. Clear user installation and tutorials 5. CI related: automated testing, continuous building

    Due by August 2, 2024
    15/15 issues closed