Optimize for OpenAI Prompt Caching: Restructure entity extraction prompts for 50% cost reduction and faster indexing

## Summary

OpenAI introduced automatic prompt caching in October 2024 for GPT-4o, GPT-4o-mini, o1-preview, and o1-mini models. This feature provides a **50% discount on cached prompt tokens** and **faster processing times** for prompts longer than 1024 tokens.

However, LightRAG's current prompt structure prevents effective caching during indexing, missing a significant opportunity to reduce costs and improve indexing latency.

## The Problem

### Current Prompt Structure

In `lightrag/operate.py:2807-2820`, the entity extraction system prompt embeds variable content (`input_text`) directly into the system message:

```python
entity_extraction_system_prompt = PROMPTS[
    "entity_extraction_system_prompt"
].format(**{**context_base, "input_text": content})
```

This creates a system prompt that looks like:

```
---Role--- (static, ~100 tokens)
---Instructions--- (static, ~400 tokens)  
---Examples--- (static, ~800 tokens)
---Real Data to be Processed---
<Input>
Entity_types: [static during indexing run]
Text:
```
{input_text}  ← THIS CHANGES FOR EVERY CHUNK ❌
```
```
```

### Why This Prevents Caching

OpenAI's prompt caching works by caching the **longest shared prefix** of prompts. Since `input_text` is embedded at the end of the system prompt, every chunk creates a completely different system prompt string. There is no shared prefix across chunks, so **nothing gets cached**.

### Reference

From the prompt template in `lightrag/prompt.py:11-69`:

```python
PROMPTS["entity_extraction_system_prompt"] = """---Role---
...
---Real Data to be Processed---
<Input>
Entity_types: [{entity_types}]
Text:
```
{input_text}  # Variable content embedded in system prompt
```
"""
```

## The Solution

### Restructure Prompts for Caching

To leverage OpenAI's automatic prompt caching, the prompts should be restructured:

**Optimal structure:**
- **System message:** Static instructions + examples + entity types (~1300 tokens, cacheable!)
- **User message:** Just the variable `input_text` (~150 tokens per chunk)

This would allow the ~1300 token system message to be cached and reused for ALL chunks during an indexing run, with only the small user message varying.

### Proposed Changes

1. **Split the system prompt template** (`lightrag/prompt.py`):
   - Remove `{input_text}` from `entity_extraction_system_prompt`
   - Keep only the static instructions, examples, and entity types
   
2. **Modify the user prompt template**:
   - Make `entity_extraction_user_prompt` contain the variable `input_text`
   
3. **Update the extraction logic** (`lightrag/operate.py`):
   - Format system prompt once (without input_text)
   - Format user prompt with input_text for each chunk

### Example Restructured Template

```python
PROMPTS["entity_extraction_system_prompt"] = """---Role---
You are a Knowledge Graph Specialist responsible for extracting entities and relationships from the input text.

---Instructions---
[... all the static instructions ...]

---Examples---
[... all the examples ...]

---Entity Types---
Entity_types: [{entity_types}]
"""

PROMPTS["entity_extraction_user_prompt"] = """---Task---
Extract entities and relationships from the following input text.

---Input Text---
```
{input_text}
```

---Output---
"""
```

## Expected Impact

### Cost Savings

For a typical indexing run of 8,000 chunks:

- **Current:** ~1,450 tokens × 8,000 chunks = ~11.6M prompt tokens (all counted as new)
- **With caching:** ~1,450 tokens (first chunk) + ~150 tokens × 7,999 chunks = ~1.3M new prompt tokens + ~10.4M cached tokens (50% discount)
- **Result: ~45% cost reduction on prompt tokens during indexing**

### Latency Improvements

- Cached prompt tokens process **significantly faster** than new tokens
- Reduces overall indexing time, especially for large document collections
- More responsive during bulk upload operations

### Automatic Activation

OpenAI's prompt caching is **automatic** for prompts > 1024 tokens:
- No API changes required beyond restructuring prompts
- Works with existing GPT-4o, GPT-4o-mini, o1-preview, o1-mini models
- Cache persists 5-10 minutes (max 1 hour), perfect for batch indexing

## References

- [OpenAI Prompt Caching Documentation](https://platform.openai.com/docs/guides/prompt-caching)
- [OpenAI Announcement: API Prompt Caching](https://openai.com/index/api-prompt-caching/)
- [Prompt Caching 101 - OpenAI Cookbook](https://cookbook.openai.com/examples/prompt_caching101)

## Additional Benefits

This optimization would:
- ✅ Reduce indexing costs by ~45% for OpenAI users
- ✅ Improve indexing latency significantly
- ✅ Make LightRAG more cost-effective for large-scale deployments
- ✅ Require minimal code changes
- ✅ Work automatically without user configuration

## Affected Files

- `lightrag/prompt.py` - Prompt templates
- `lightrag/operate.py` - Entity extraction logic (lines ~2807-2850)

Thank you for considering this optimization! Happy to provide more details or assist with implementation if helpful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize for OpenAI Prompt Caching: Restructure entity extraction prompts for 50% cost reduction and faster indexing #2355

Summary

The Problem

Current Prompt Structure

The Solution

Restructure Prompts for Caching

Proposed Changes

Example Restructured Template

Expected Impact

Cost Savings

Latency Improvements

Automatic Activation

References

Additional Benefits

Affected Files

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimize for OpenAI Prompt Caching: Restructure entity extraction prompts for 50% cost reduction and faster indexing #2355

Description

Summary

The Problem

Current Prompt Structure

The Solution

Restructure Prompts for Caching

Proposed Changes

Example Restructured Template

Expected Impact

Cost Savings

Latency Improvements

Automatic Activation

References

Additional Benefits

Affected Files

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions