[Feature Request] Support TextBlock-level Prompt Caching in Bedrock Converse Integration

### Feature Description

**Feature Description:**  
Amazon Bedrock now offers prompt caching at the text block level, enabling significant reductions in response latency (up to 85%) and inference costs (up to 90%) for repetitive, long-context prompts (see AWS [blog post](https://aws.amazon.com/blogs/machine-learning/effectively-use-prompt-caching-on-amazon-bedrock/)). 

However, LlamaIndex’s Bedrock Converse integration currently lacks native support for passing the `cache_control` parameter at the TextBlock level. This prevents users from marking static prompt segments (like instructions or documents) for caching, as recommended by AWS. 

This feature request is to add support for `cache_control` directly on TextBlock objects in LlamaIndex’s Bedrock Converse integration. With this, developers can leverage Bedrock’s cache checkpoints for each static text block, unlocking improved performance for document Q&A, and agentic workflows.

References:
- [Amazon Bedrock Prompt Caching Blog](https://aws.amazon.com/blogs/machine-learning/effectively-use-prompt-caching-on-amazon-bedrock/)

### Reason

Without base support for text block level caching in LlamaIndex, it’s impossible to propagate prompt caching controls to downstream agent and workflow components, especially FunctionAgent and other Agent types. This leads to poor performance and higher costs for all use cases that reuse large static prompt segments (e.g., static system instructions that contain dynamic context too). Existing approaches only allow caching at the chat message level, which is insufficient for agentic or multi-block workflows where only parts of the prompt are static and reusable. As a result, we (and many others) cannot fully leverage Bedrock’s caching capabilities to speed up agentic flows or reduce costs.

### Value of Feature

- **Performance:** Dramatically lowers time-to-first-token on repeated agentic and document Q&A flows by skipping redundant computation for static prompt blocks.
- **Cost:** Reduces inference costs for workloads with repeated context, thanks to Bedrock’s per-token cache read discounts.
- **Developer Experience:** Enables fine-grained caching control, letting users optimize which blocks are cached and which remain dynamic—just as AWS recommends.
- **Scalability:** Unlocks efficient scaling for applications with high prompt reuse (e.g., support bots, coding assistants, multi-turn chat) without architectural workarounds.
- **Ecosystem Alignment:** Brings LlamaIndex up to date with AWS Bedrock’s latest best practices, ensuring users get the full benefit of new platform features.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Support TextBlock-level Prompt Caching in Bedrock Converse Integration #19153

Feature Description

Reason

Value of Feature

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Support TextBlock-level Prompt Caching in Bedrock Converse Integration #19153

Description

Feature Description

Reason

Value of Feature

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions