-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Description
Summary
OpenAI introduced automatic prompt caching in October 2024 for GPT-4o, GPT-4o-mini, o1-preview, and o1-mini models. This feature provides a 50% discount on cached prompt tokens and faster processing times for prompts longer than 1024 tokens.
However, LightRAG's current prompt structure prevents effective caching during indexing, missing a significant opportunity to reduce costs and improve indexing latency.
The Problem
Current Prompt Structure
In lightrag/operate.py:2807-2820, the entity extraction system prompt embeds variable content (input_text) directly into the system message:
entity_extraction_system_prompt = PROMPTS[
"entity_extraction_system_prompt"
].format(**{**context_base, "input_text": content})This creates a system prompt that looks like:
---Role--- (static, ~100 tokens)
---Instructions--- (static, ~400 tokens)
---Examples--- (static, ~800 tokens)
---Real Data to be Processed---
<Input>
Entity_types: [static during indexing run]
Text:
{input_text} ← THIS CHANGES FOR EVERY CHUNK ❌
### Why This Prevents Caching
OpenAI's prompt caching works by caching the **longest shared prefix** of prompts. Since `input_text` is embedded at the end of the system prompt, every chunk creates a completely different system prompt string. There is no shared prefix across chunks, so **nothing gets cached**.
### Reference
From the prompt template in `lightrag/prompt.py:11-69`:
```python
PROMPTS["entity_extraction_system_prompt"] = """---Role---
...
---Real Data to be Processed---
<Input>
Entity_types: [{entity_types}]
Text:
{input_text} # Variable content embedded in system prompt
"""
The Solution
Restructure Prompts for Caching
To leverage OpenAI's automatic prompt caching, the prompts should be restructured:
Optimal structure:
- System message: Static instructions + examples + entity types (~1300 tokens, cacheable!)
- User message: Just the variable
input_text(~150 tokens per chunk)
This would allow the ~1300 token system message to be cached and reused for ALL chunks during an indexing run, with only the small user message varying.
Proposed Changes
-
Split the system prompt template (
lightrag/prompt.py):- Remove
{input_text}fromentity_extraction_system_prompt - Keep only the static instructions, examples, and entity types
- Remove
-
Modify the user prompt template:
- Make
entity_extraction_user_promptcontain the variableinput_text
- Make
-
Update the extraction logic (
lightrag/operate.py):- Format system prompt once (without input_text)
- Format user prompt with input_text for each chunk
Example Restructured Template
PROMPTS["entity_extraction_system_prompt"] = """---Role---
You are a Knowledge Graph Specialist responsible for extracting entities and relationships from the input text.
---Instructions---
[... all the static instructions ...]
---Examples---
[... all the examples ...]
---Entity Types---
Entity_types: [{entity_types}]
"""
PROMPTS["entity_extraction_user_prompt"] = """---Task---
Extract entities and relationships from the following input text.
---Input Text---{input_text}
---Output---
"""
Expected Impact
Cost Savings
For a typical indexing run of 8,000 chunks:
- Current: ~1,450 tokens × 8,000 chunks = ~11.6M prompt tokens (all counted as new)
- With caching: ~1,450 tokens (first chunk) + ~150 tokens × 7,999 chunks = ~1.3M new prompt tokens + ~10.4M cached tokens (50% discount)
- Result: ~45% cost reduction on prompt tokens during indexing
Latency Improvements
- Cached prompt tokens process significantly faster than new tokens
- Reduces overall indexing time, especially for large document collections
- More responsive during bulk upload operations
Automatic Activation
OpenAI's prompt caching is automatic for prompts > 1024 tokens:
- No API changes required beyond restructuring prompts
- Works with existing GPT-4o, GPT-4o-mini, o1-preview, o1-mini models
- Cache persists 5-10 minutes (max 1 hour), perfect for batch indexing
References
- OpenAI Prompt Caching Documentation
- OpenAI Announcement: API Prompt Caching
- Prompt Caching 101 - OpenAI Cookbook
Additional Benefits
This optimization would:
- ✅ Reduce indexing costs by ~45% for OpenAI users
- ✅ Improve indexing latency significantly
- ✅ Make LightRAG more cost-effective for large-scale deployments
- ✅ Require minimal code changes
- ✅ Work automatically without user configuration
Affected Files
lightrag/prompt.py- Prompt templateslightrag/operate.py- Entity extraction logic (lines ~2807-2850)
Thank you for considering this optimization! Happy to provide more details or assist with implementation if helpful.