Releases: llm-d-incubation/workload-variant-autoscaler
Releases · llm-d-incubation/workload-variant-autoscaler
v0.0.3
What's Changed
- Refactoring deployment scripts for Kubernetes and Kind by @WheelyMcBones in #226
- Add prom queries for avg input tokens and ttft by @vishakha-ramani in #194
- Enhance Helm chart security configuration and add development values by @mamy-CS in #242
- Refactoring deployment scripts and integrating llm-d-inference-sim in emulated deployment environment by @WheelyMcBones in #237
Full Changelog: v0.0.2...v0.0.3
v0.0.2
What's Changed
- Update crd docs by @asm582 in #184
- Update README.md to include ref to new home for container image by @clubanderson in #183
- Update ci-release.yaml to do multi arch by @clubanderson in #182
- Removing unsupported architectures from image multi-arch build by @WheelyMcBones in #192
- Setting minNumReplicas to 1 by default by @WheelyMcBones in #189
- Update ci-release.yaml to include 'latest' tag by @clubanderson in #193
- Update QuickStart documentation to disable Rosetta on Apple Silicon by @myechuri in #195
- Fix zero divide in queue analyzer by @atantawi in #190
- add helm chart for ocp, supporting files, and instructions to install wva, prom adaptor, and llm-d in simple form by @clubanderson in #200
- removed 'oc adm policy' (replaced with clusterrolebinding) and rollout commands - no longer needed by @clubanderson in #207
- Deployment script and readme for wva + llmd on openshift by @mamy-CS in #203
- updated files to consume llmd model name and modelID by @clubanderson in #209
- Fixing tolerance function by @WheelyMcBones in #213
- Remove resource accounting logic and disable limited mode by @mamy-CS in #210
- E2es on openshift using sharegpt data by @mamy-CS in #201
- remove epp from wva helm chart and instructions to delete it after wv… by @clubanderson in #215
- Adding unit tests for the internal optimizer code by @WheelyMcBones in #204
- Add metrics validation and health monitoring system with Kubernetes conditions by @mamy-CS in #214
- refactor: Reorganize repository structure and documentation by @mamy-CS in #216
- Changing OCP deployment script to use the Helm chart by @WheelyMcBones in #212
- Documentation update by @mamy-CS in #223
- Parameterizing OCP script by @WheelyMcBones in #218
- Optimize VariantAutoscaling's owner setting by @learner0810 in #219
- refactor: Externalize metric names and labels to constants package by @ev-shindin in #228
- Enhancements to OC E2E Testing by @Vezio in #220
- Helm Chart Refactoring by @Vezio in #222
New Contributors
- @myechuri made their first contribution in #195
- @learner0810 made their first contribution in #219
- @Vezio made their first contribution in #220
Full Changelog: v0.0.1...v0.0.2
v0.0.1
What's Changed
- Add link to prerequisites in readme by @atantawi in #3
- Fix references in docs and readme files by @atantawi in #8
- Have sample demo data in a common repo by @atantawi in #9
- Remove control loop by @atantawi in #10
- Changes to move to new design by @asm582 in #11
- change interface of optimizer by @asm582 in #13
- add logger with additional code changes by @asm582 in #15
- Initial integration of inferno model analyzer and optimizer functionality by @atantawi in #17
- feat: install cluster with multiple nodes, gpus by @haroldship in #21
- Fix prometheus address by @haroldship in #25
- Inferno emulator mode by @mamy-CS in #18
- Revert "Inferno emulator mode" by @mamy-CS in #27
- Vendor vllme (new-metric branch) into inferno-autoscaler by @mamy-CS in #24
- automated inferno deployment for dev by @mamy-CS in #28
- add license file by @mamy-CS in #31
- Scale down variant to one replica when no traffic by @atantawi in #33
- Fix setupwithmanager in controller by @asm582 in #29
- use ctrl runtime backed cache for listing nodes by @asm582 in #26
- resolve merge issue by @asm582 in #34
- Variant to keep accelerator by @atantawi in #30
- add retries by @asm582 in #35
- enable HA in controller by @asm582 in #36
- improve error handling and retries for current design by @asm582 in #37
- update reconciler to smaller, composable helper functions and reduce inline logic by @asm582 in #38
- Align vllme metrics with vllm for autoscaler compatibility by @vishakha-ramani in #32
- rem modelservice requeue for reconcile by @asm582 in #40
- Support configured maximum batch size by @atantawi in #41
- Remove hard-coded accelerator name by @atantawi in #43
- llm-d integration by @WheelyMcBones in #42
- Actuator Emit custom metrics to Prometheus by @mamy-CS in #39
- Remove requeue when optimization fails by @asm582 in #49
- return errors for cm config by @asm582 in #53
- Deployment and test env cleanup by @mamy-CS in #50
- update readme by @mamy-CS in #54
- simplify watchandrun loop by @asm582 in #56
- Fix llmd integration by @WheelyMcBones in #58
- move to openAI modelid format by @asm582 in #57
- Use reconciler to run periodically by @asm582 in #59
- Update readme and install cm in inferno ns by @asm582 in #61
- multi-arch build by @mamy-CS in #62
- Modified vllme load generator by @vishakha-ramani in #60
- loadgen deterministic mode doc update by @mamy-CS in #63
- fixed waiting for Gateway and EPP deployments by @WheelyMcBones in #64
- Add documentation for modeling and analysis by @atantawi in #67
- Report allocatable resources by @asm582 in #66
- Update modelling documentation by @atantawi in #69
- enabling installation for amd64 arch by @WheelyMcBones in #68
- x86 llmd infra installation fixes by @mamy-CS in #72
- improve logging readability by @WheelyMcBones in #77
- refactoring backoff logic into global backoff by @WheelyMcBones in #89
- align slo names to community terms by @asm582 in #101
- first basic E2E tests by @WheelyMcBones in #90
- fix lint errors by @asm582 in #105
- E2E scaling tests by @WheelyMcBones in #104
- fix make test by @asm582 in #108
- Add optimize section by @asm582 in #109
- Remove dummy analyzer code by @asm582 in #110
- add crd api docs by @asm582 in #111
- fixes for crd and docs by @asm582 in #113
- fix shortname issue by @asm582 in #115
- add tls configuration by @mamy-CS in #103
- Testing continuous generated load and multiple VAs scenarios by @WheelyMcBones in #112
- add gha workflows by @clubanderson in #121
- remove precommit by @clubanderson in #124
- Fix llm-d deploy and modify tests to align with community feedback by @WheelyMcBones in #122
- add make test-e2e by @clubanderson in #126
- ignore sync errors from zap logger by @asm582 in #125
- Handle infeasible optimization solution by @atantawi in #102
- Fix E2E test execution on CI by @WheelyMcBones in #130
- Remove manual trigger logic by @asm582 in #128
- Add unit tests to Inferno-autoscaler components by @WheelyMcBones in #133
- Changing E2E to check emitted Inferno metrics by @WheelyMcBones in #135
- HPA integration by @WheelyMcBones in #137
- Integrating llm-d infra into E2E tests by @WheelyMcBones in #138
- Add api unit tests by @asm582 in #140
- Include HPA in README config by @WheelyMcBones in #142
- consolidate yaml samples in markdown file by @vishakha-ramani in #141
- Computing scaling decision based on ratio metric for HPA by @WheelyMcBones in #143
- Changes to the collector: token query by @vishakha-ramani in #146
- Upgrade optimizer-light to v0.5.0 by @atanta...