Architecture Discussion: Economy Mode and Vector Database Full-Text Search CapabilitiesSuggestions for New Features #22955

yunqiqiliang · 2025-07-25T06:39:00Z

yunqiqiliang
Jul 25, 2025

Self Checks

I have searched for existing issues search for existing issues, including closed ones.
I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:)
Please do not modify this template :) and fill in all the required fields.

1. Is this request related to a challenge you're experiencing? Tell me about your story.

Hi Dify community!

I'd like to start a discussion about a fundamental architecture issue I discovered while integrating ClickZetta vector database with Dify. This affects all vector databases with full-text search capabilities, not just ClickZetta.

🔍 The Problem

Dify currently supports three search modes when you choose High Quality indexing:

✅ Vector search: Semantic similarity using embeddings
✅ Full-text search: Using vector database's native capabilities
✅ Hybrid search: Combining both approaches

However, if you want to use Economy Mode to save costs:

❌ You get zero vector database functionality - even full-text search
❌ Everything falls back to PostgreSQL + jieba keyword matching
❌ You lose access to professional search engines' optimizations

The core issue: You're forced to choose High Quality mode (with expensive vector embeddings) just to access full-text search capabilities that are actually cost-efficient.

Current forced choice:

High Quality: Vector embeddings (expensive) + Full-text search (cheap) ✅
Economy: PostgreSQL + jieba (cheap) ❌ No vector database at all

What users actually want:

Full-text Only: No vector embeddings (cheap) + Vector DB full-text search (cheap) ❌ Not possible

📊 Impact Analysis

I checked Dify's codebase and found that ALL 43 vector database implementations support the search_by_full_text() method:

Vector Database	Full-Text Capability	Economy Mode Status
Elasticsearch	Native inverted index	❌ Completely bypassed
OpenSearch	Native inverted index	❌ Completely bypassed
ClickZetta	Inverted index	❌ Completely bypassed
Weaviate	BM25F algorithm	❌ Completely bypassed
PGVector	PostgreSQL FTS	❌ Completely bypassed
Milvus	Scalar field search	❌ Completely bypassed
Qdrant	Full-text search	❌ Completely bypassed
... and 36 others	Various FTS methods	❌ All bypassed

🚨 Real-World Customer Issue

This issue was discovered by an actual customer who reported it to me today. They were trying to use ClickZetta (which has excellent inverted index capabilities) with Dify's Economy Mode to save costs while still getting professional search quality.

Customer's expectation:

Documents stored in ClickZetta tables
ClickZetta's inverted index used for fast full-text search
Cost savings from skipping expensive vector embedding computation
Still benefit from ClickZetta's search optimizations

What actually happened:

ClickZetta tables never created ❌
No data written to ClickZetta at all ❌
Fallback to PostgreSQL + jieba keyword matching ❌
Complete waste of their ClickZetta infrastructure investment ❌

Customer impact:

Cannot utilize their existing ClickZetta investment in economy mode
Forced to choose between cost savings OR search quality
No way to get professional full-text search without paying for vector embeddings

🔧 Code Evidence

Looking at the architecture:

BaseVector abstract class requires both methods:

@abstractmethod
def search_by_vector(self, query_vector: list[float], **kwargs) -> list[Document]

@abstractmethod
def search_by_full_text(self, query: str, **kwargs) -> list[Document]

Economy mode logic in indexing_runner.py (lines 533-540):

if dataset.indexing_technique == "economy":
    # Only creates jieba keyword index, completely skips vector database

All vector databases implement full-text search, but economy mode never uses it.

💡 Proposed Solutions

Option 1: Redesign Economy Mode (Recommended)

✅ Create vector database tables and write data
✅ Skip expensive vector embedding computation
✅ Use vector database's native full-text search
✅ Maintain data consistency and user choice

Option 2: Improve Current Modes

Current state (already supported):

High Quality: Vector search + Full-text search + Hybrid search ✅
Economy: PostgreSQL + jieba (bypasses vector databases) ❌

The issue: You MUST choose High Quality mode to get any vector database functionality, even if you only want full-text search.

Option 3: Storage Backend Choice

Let users explicitly choose where to store their data, regardless of search strategy.

🤔 Discussion Questions

Cost vs. Functionality: Should users be able to use vector database full-text search without paying for vector embeddings?
Architecture Design: Would it make sense to add a "Full-text Only" mode that uses vector databases but skips embeddings?
User Choice: Should Economy mode be redesigned to allow vector database storage with keyword-only search?
Performance Trade-offs: What are the real performance differences between:
- PostgreSQL + jieba (current economy mode)
- Vector database full-text search (currently requires high quality mode)
- Professional search engines like Elasticsearch (currently underutilized in economy mode)
Hybrid Architecture: How can we better separate storage decisions from search strategy decisions?

🎯 Why This Matters

For Real Customers (like the one who reported this):

Infrastructure investment waste: Can't use their existing vector database investment in cost-saving mode
False economy: Forced to choose expensive vector embeddings just to access cheap full-text search
Performance limitations: Stuck with jieba keyword matching instead of professional search engines
Search quality degradation: Miss out on advanced full-text algorithms and optimizations

For Dify:

Architecture flexibility to support diverse enterprise needs
Competitive advantage through better vector database integration
Future-proofing as vector databases continue to evolve

For the Ecosystem:

Best practices for RAG system architecture
Proper utilization of modern vector database capabilities
Technical leadership in the AI application space

🚀 Call to Action

I believe this deserves community-wide discussion because:

It affects every vector database integration in Dify
It impacts enterprise adoption and real-world performance
It's about architectural philosophy, not just a single feature
The solution will benefit the entire Dify ecosystem

🤝 What's Next?

I'd love to hear from:

Dify core team: What are your thoughts on the current design decisions?
Vector database users: Have you encountered this limitation?
Enterprise users: How important is full-text search performance to you?
Contributors: Who's interested in working on potential solutions?

Let's discuss! I'm happy to contribute code once we align on the direction.

Tags: #architecture #vector-database #search #economy-mode #full-text-search

What are your thoughts on this architectural challenge? 🤔

2. Additional context or comments

No response

3. Can you help us with this feature?

I am interested in contributing to this feature.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Architecture Discussion: Economy Mode and Vector Database Full-Text Search CapabilitiesSuggestions for New Features #22955

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Architecture Discussion: Economy Mode and Vector Database Full-Text Search CapabilitiesSuggestions for New Features #22955

Uh oh!

yunqiqiliang Jul 25, 2025

Self Checks

1. Is this request related to a challenge you're experiencing? Tell me about your story.

🔍 The Problem

Current forced choice:

What users actually want:

📊 Impact Analysis

🚨 Real-World Customer Issue

🔧 Code Evidence

💡 Proposed Solutions

Option 1: Redesign Economy Mode (Recommended)

Option 2: Improve Current Modes

Option 3: Storage Backend Choice

🤔 Discussion Questions

🎯 Why This Matters

For Real Customers (like the one who reported this):

For Dify:

For the Ecosystem:

🚀 Call to Action

🤝 What's Next?

2. Additional context or comments

3. Can you help us with this feature?

Replies: 0 comments

yunqiqiliang
Jul 25, 2025