Skip to content

Conversation

@orozery
Copy link
Contributor

@orozery orozery commented Dec 29, 2025

No description provided.

@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

* CPU block size 16 tokens
* De/Tokenization disabled

Our benchmark code can be found [here](https://github.com/orozery/playground/blob/kv-offloading-blog-dec-2025/kvcache/kv_offload_benchmark.py).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can drop this benchmark script with small change into https://github.com/vllm-project/vllm/tree/main/benchmarks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! I think this script is useful and will get more exposure there.
However, it will take some time to get it in there, and I think getting the blog post now is a higher priority.

@esmeetu
Copy link
Member

esmeetu commented Jan 7, 2026

Thanks for the great contribution! Along with the comments above, it might be helpful to add a few more offloading connector use cases at the end of the blog.
Adding a Slack channel could also help the community collaborate and iterate on this over time.

@orozery
Copy link
Contributor Author

orozery commented Jan 7, 2026

Thanks for the great contribution! Along with the comments above, it might be helpful to add a few more offloading connector use cases at the end of the blog. Adding a Slack channel could also help the community collaborate and iterate on this over time.

Thanks for reviewing!
I don't have any more use-cases to elaborate on at this point.
I've added a section at the end to engage community discussion (including a Slack channel).

@esmeetu
Copy link
Member

esmeetu commented Jan 8, 2026

LGTM! Thanks @orozery

@esmeetu esmeetu force-pushed the offloading-connector-dec-2025 branch from fe7d325 to 9059fb9 Compare January 8, 2026 00:06
@esmeetu esmeetu merged commit 7c10446 into vllm-project:main Jan 8, 2026
2 checks passed
@@ -0,0 +1,234 @@
---
layout: post
title: "Inside vLLM’s New Offloading Connector: Smarter Memory Transfer for Maximizing Inference Throughput"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the title should be "KV Offloading" since it isn't clear if we are talking about model offloading

Copy link
Member

@esmeetu esmeetu Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This applies to the blog name as well.
@orozery Could we update this in another PR before we promote on social medias?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants