Is there a way to make LLM requests more efficient? #34788
MrTact
started this conversation in
Feature Ideas / Enhancements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Had the experience this past weekend where I created a project using Claude Opus and built out the first couple of tasks. I created a project requirements doc, which amounts to about 6500 tokens, which should get sent at the start of each conversation. With only a few direct prompts, though, I see lots of requests in the 65K token range... including multiple turns per user prompt. This very quickly burned in excess of 4m tokens!
It definitely seems like this could benefit from Anthropic's caching -- feels like the LLM interaction is just dumping the entire context every request. However, that's also one provider -- it's hard to imagine how this could be abstracted to work with lots of different providers. Is this something that is worth discussing further?
Beta Was this translation helpful? Give feedback.
All reactions