List view
- Due by August 14, 2025•2/2 issues closed
- Due by October 3, 2025•28/28 issues closed
- Due by June 30, 2025•46/46 issues closed
- Model Centric API - Offline Inference Performance Improvement - Prefix cache + Heterogenous Routing + Fairness request routing
Due by April 30, 2025•50/50 issues closed- Due by October 31, 2024•13/13 issues closed
- Due by October 19, 2024•8/8 issues closed
1. Stability Improvement with all bug fixes. 2. Model centric deployment
Due by October 4, 2024•31/31 issues closedFocus more on the advanced features - Distributed and Disaggregated Inference - Distributed KV Cache - Cost-efficient Heterogenous placement and routing - Cost-efficient Heterogeneous Serving - v0.1.0 feature quality improvement
Due by January 31, 2025•98/99 issues closed1. Heterogeneous hardwares support in request routing 2. Advanced autoscaling algorithms support (KPA and our own solutions) 3. Lora feature improvements like artifact registry, unload etc 4. Create CLI to simplify lower the entry barrier for users 5. Support multi-host inference through RayClusterGroup. 6. Support GPU streaming acceleration
Due by September 24, 2024•22/22 issues closed- Due by September 8, 2024•24/24 issues closed
All stories and RFCs proposed in v0.1.0 could be fully delivered
Due by October 31, 2024•12/12 issues closed1. Support Lora Model Adapter Workflow 2. Support HPA through customized Pod Autoscaler 3. Support Envoy-Gateway based service routing, model selection and meta information registration 4. Clear user installation and tutorials 5. CI related: automated testing, continuous building
Due by August 2, 2024•15/15 issues closed