Skip to content

[Feature Request] Support TextBlock-level Prompt Caching in Bedrock Converse Integration #19153

Open
@mthooyavan

Description

@mthooyavan

Feature Description

Feature Description:
Amazon Bedrock now offers prompt caching at the text block level, enabling significant reductions in response latency (up to 85%) and inference costs (up to 90%) for repetitive, long-context prompts (see AWS blog post).

However, LlamaIndex’s Bedrock Converse integration currently lacks native support for passing the cache_control parameter at the TextBlock level. This prevents users from marking static prompt segments (like instructions or documents) for caching, as recommended by AWS.

This feature request is to add support for cache_control directly on TextBlock objects in LlamaIndex’s Bedrock Converse integration. With this, developers can leverage Bedrock’s cache checkpoints for each static text block, unlocking improved performance for document Q&A, and agentic workflows.

References:

Reason

Without base support for text block level caching in LlamaIndex, it’s impossible to propagate prompt caching controls to downstream agent and workflow components, especially FunctionAgent and other Agent types. This leads to poor performance and higher costs for all use cases that reuse large static prompt segments (e.g., static system instructions that contain dynamic context too). Existing approaches only allow caching at the chat message level, which is insufficient for agentic or multi-block workflows where only parts of the prompt are static and reusable. As a result, we (and many others) cannot fully leverage Bedrock’s caching capabilities to speed up agentic flows or reduce costs.

Value of Feature

  • Performance: Dramatically lowers time-to-first-token on repeated agentic and document Q&A flows by skipping redundant computation for static prompt blocks.
  • Cost: Reduces inference costs for workloads with repeated context, thanks to Bedrock’s per-token cache read discounts.
  • Developer Experience: Enables fine-grained caching control, letting users optimize which blocks are cached and which remain dynamic—just as AWS recommends.
  • Scalability: Unlocks efficient scaling for applications with high prompt reuse (e.g., support bots, coding assistants, multi-turn chat) without architectural workarounds.
  • Ecosystem Alignment: Brings LlamaIndex up to date with AWS Bedrock’s latest best practices, ensuring users get the full benefit of new platform features.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesttriageIssue needs to be triaged/prioritized

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions