Skip to content

Releases: pacoxu/AI-Infra

v0.0.1

05 Nov 07:23
ddb38ea

Choose a tag to compare

What's Changed

  • Pod Lifecycle(AI): Pod startup speed optimization, cold-start, sleep mode, and offloading.
  • DRA updates: NVIDIA GPU Operator and DRA Driver, NRI
  • Workload solutions(P/D disaggregation): LWS, SGLang RBG, AIBrix StormService, Kthena, KServe, Dynamo, vllm Production Stackm OME.
  • KV Cache comparison: NIXL, LMCache, Mooncake
  • Scheduling: Volcano, NVIDIA Grove, Kueue, Godel, Koordinator, HAMI, KAI Scheduler.
  • Gateway: Envoy AI Gateway, Semantic Router, KGateway, Kong.
  • Performance testing and benchmarking tools
  • Community Update: AI Conformance, Kubernetes workgroups and CNCF tags/initiatives.

More

  • Large Scale Experts (MoE)
  • AIConfigurator
  • Observability
  • Training on Kubernetes: Kubeflow Trainer V2 and ArgoCD ; GPU checkpoint/restore
  • Serverless, Knative
  • AI workload isolation
  • parallelism
  • pre-training

Full Changelog: https://github.com/pacoxu/AI-Infra/commits/v0.0.1

初步成形

  • 目前缺少一些基础的模型AI知识
  • 另外训练内容可能相对较少
  • 缺少中文
  • landscape 比较粗糙

但是

  • AI workloads 编排管理
  • 现有相关的项目(不包含更高一层的agent 内容)
  • 云原生方向为主

基本还是覆盖了的