Is there a way to make LLM requests more efficient? #34788

MrTact · 2025-07-20T19:30:11Z

MrTact
Jul 20, 2025

Had the experience this past weekend where I created a project using Claude Opus and built out the first couple of tasks. I created a project requirements doc, which amounts to about 6500 tokens, which should get sent at the start of each conversation. With only a few direct prompts, though, I see lots of requests in the 65K token range... including multiple turns per user prompt. This very quickly burned in excess of 4m tokens!

It definitely seems like this could benefit from Anthropic's caching -- feels like the LLM interaction is just dumping the entire context every request. However, that's also one provider -- it's hard to imagine how this could be abstracted to work with lots of different providers. Is this something that is worth discussing further?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is there a way to make LLM requests more efficient? #34788

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Is there a way to make LLM requests more efficient? #34788

Uh oh!

MrTact Jul 20, 2025

Replies: 0 comments

MrTact
Jul 20, 2025