Releases: pacoxu/AI-Infra
Releases · pacoxu/AI-Infra
v0.0.1
What's Changed
- Pod Lifecycle(AI): Pod startup speed optimization, cold-start, sleep mode, and offloading.
- DRA updates: NVIDIA GPU Operator and DRA Driver, NRI
- Workload solutions(P/D disaggregation): LWS, SGLang RBG, AIBrix StormService, Kthena, KServe, Dynamo, vllm Production Stackm OME.
- KV Cache comparison: NIXL, LMCache, Mooncake
- Scheduling: Volcano, NVIDIA Grove, Kueue, Godel, Koordinator, HAMI, KAI Scheduler.
- Gateway: Envoy AI Gateway, Semantic Router, KGateway, Kong.
- Performance testing and benchmarking tools
- Community Update: AI Conformance, Kubernetes workgroups and CNCF tags/initiatives.
More
- Large Scale Experts (MoE)
- AIConfigurator
- Observability
- Training on Kubernetes: Kubeflow Trainer V2 and ArgoCD ; GPU checkpoint/restore
- Serverless, Knative
- AI workload isolation
- parallelism
- pre-training
Full Changelog: https://github.com/pacoxu/AI-Infra/commits/v0.0.1
初步成形
- 目前缺少一些基础的模型AI知识
- 另外训练内容可能相对较少
- 缺少中文
- landscape 比较粗糙
但是
- AI workloads 编排管理
- 现有相关的项目(不包含更高一层的agent 内容)
- 云原生方向为主
基本还是覆盖了的