llm-d incubation

All

7 repositories

workload-variant-autoscaler
Public
Variant optimization autoscaler for distributed inference workloads
Go
•
Apache License 2.0
•18•21•63•6•Updated Nov 15, 2025Nov 15, 2025
llm-d-fast-model-actuation
Public
Go
•
Apache License 2.0
•7•7•38•6•Updated Nov 14, 2025Nov 14, 2025
llm-d-modelservice
Public
helm charts for deploying models with llm-d
Smarty
•35•23•8•6•Updated Nov 12, 2025Nov 12, 2025
hermes
Public
Hermes is a cluster configuration scanning and self-test generation tool for llm-d inference workloads
Rust
•0•0•0•0•Updated Nov 12, 2025Nov 12, 2025
llm-d-infra
Public
llm-d helm charts and deployment examples
Shell
•
Apache License 2.0
•45•46•15•19•Updated Oct 2, 2025Oct 2, 2025
llm-d-ci
Public
Shell
•2•2•0•0•Updated Aug 6, 2025Aug 6, 2025
ig-wva
Public
Workload Variant Autoscaler is a service to compute the cost-optimal provisioning of heterogeneous accelerators for inference workloads with varying request latency objectives
Jupyter Notebook
•1•1•0•1•Updated Jul 11, 2025Jul 11, 2025