-
Notifications
You must be signed in to change notification settings - Fork 61
Blog: Inside vLLM’s New Offloading Connector #149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blog: Inside vLLM’s New Offloading Connector #149
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
002e08e to
07b2642
Compare
| * CPU block size 16 tokens | ||
| * De/Tokenization disabled | ||
|
|
||
| Our benchmark code can be found [here](https://github.com/orozery/playground/blob/kv-offloading-blog-dec-2025/kvcache/kv_offload_benchmark.py). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can drop this benchmark script with small change into https://github.com/vllm-project/vllm/tree/main/benchmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! I think this script is useful and will get more exposure there.
However, it will take some time to get it in there, and I think getting the blog post now is a higher priority.
|
Thanks for the great contribution! Along with the comments above, it might be helpful to add a few more offloading connector use cases at the end of the blog. |
07b2642 to
da93e1f
Compare
da93e1f to
fe7d325
Compare
Thanks for reviewing! |
|
LGTM! Thanks @orozery |
Signed-off-by: Or Ozeri <[email protected]>
fe7d325 to
9059fb9
Compare
| @@ -0,0 +1,234 @@ | |||
| --- | |||
| layout: post | |||
| title: "Inside vLLM’s New Offloading Connector: Smarter Memory Transfer for Maximizing Inference Throughput" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the title should be "KV Offloading" since it isn't clear if we are talking about model offloading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. This applies to the blog name as well.
@orozery Could we update this in another PR before we promote on social medias?
No description provided.