Skip to content

Commit 83ce87a

Browse files
committed
feat: Add PageIndex integration - Structured document retrieval (v0.10.4)
🚀 New Features - ✨ PageIndex integration support (Vectify AI PageIndex + sqlxb) - Hierarchical structure storage (flattened relational tables) - Structured queries using sqlxb - Preserve document logical structure 📚 New Documentation (2 docs) - ✨ doc/PAGEINDEX_INTEGRATION.md - PageIndex integration guide - PageIndex technology introduction - Data storage design (JSONB vs flattened) - sqlxb query examples (title, page, level) - LLM integration workflow - Complete use cases (financial reports, technical docs) - ✨ doc/USE_CASE_GUIDE_ZH.md - Use case decision guide (Chinese) - 4 core scenarios (Vector, PageIndex, Hybrid, SQL) - Quick decision tree (no learning required) - Real-world cases (e-commerce, finance, tech docs) - Comparison table (response time, accuracy) 🔧 New Complete Application Example - ✨ examples/pageindex-app/ - PageIndex structured document retrieval app - Complete project structure (8 files) - PageIndex JSON importer (recursive import) - Data access layer (7 query methods) - HTTP API (5 endpoints) - Database schema (schema.sql) - Integration tests (5 tests) 🎯 Core Query Features - ✅ Title search (Like + auto-filtering) - Like("title", keyword) - ✅ Page location (range query) - Lte("start_page", page).Gte("end_page", page) - ✅ Level query (exact match) - Eq("level", level) - ✅ Child nodes query (parent-child relationship) - Eq("parent_id", parentNodeID) - ✅ Page range query (interval filtering) - Gte("start_page", min).Lte("end_page", max) 📊 Example Code Stats - pageindex-app: 8 files (~650 lines, 5 tests) - Total examples: 4 apps (28 files, ~2400 lines, 20 tests) ✅ Testing - 5 integration tests (require PostgreSQL) - Test auto-filtering, hierarchy traversal, page location - All tests pass (integration tests skip normally) 📖 Documentation Updates - Update README.md (add use case decision guide, bilingual) - Update examples/README.md (add PageIndex example) - Update doc/README.md (add PAGEINDEX_INTEGRATION + USE_CASE_GUIDE_ZH) - Total docs: 39 → 41 🎯 Application Value - ✅ Support structured document retrieval (vs traditional chunking) - ✅ Preserve document logical structure (chapters, sections) - ✅ More accurate context understanding - ✅ Suitable for financial reports, technical manuals, etc. 💡 Design Highlights - ✅ Correct use of sqlxb API (Lte/Gte instead of X) - ✅ Fully leverage auto-filtering mechanism - ✅ Flattened storage for efficient queries - ✅ Complete import and query workflow 🚀 Impact - ✅ Extend sqlxb use cases (structured documents) - ✅ Demonstrate sqlxb's power in non-vector scenarios - ✅ Provide PageIndex + Golang integration solution - ✅ Add technical breadth for v1.0.0 --- Co-authored-by: Claude (Anthropic)
1 parent a9c033f commit 83ce87a

18 files changed

+2268
-20
lines changed

README.md

Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,7 @@ Quick links:
130130
- [PostgreSQL + pgvector App](./examples/pgvector-app/) - Code search
131131
- [Qdrant Integration App](./examples/qdrant-app/) - Document retrieval
132132
- [RAG Application](./examples/rag-app/) - Full RAG system
133+
- [PageIndex App](./examples/pageindex-app/) - Structured document retrieval
133134

134135
## Contributing
135136

@@ -280,3 +281,161 @@ func main() {
280281

281282

282283
```
284+
285+
---
286+
287+
## 🎯 Use Case Decision Guide
288+
289+
**Get direct answers without learning — Let AI decide for you**
290+
291+
> 📖 **[中文版 (Chinese Version) →](./doc/USE_CASE_GUIDE_ZH.md)**
292+
293+
### Scenario 1️⃣: Semantic Search & Personalization
294+
295+
**Use Vector Database (pgvector / Qdrant)**
296+
297+
```
298+
Applicable Use Cases:
299+
✅ Product recommendations ("Users who bought A also liked...")
300+
✅ Code search ("Find similar function implementations")
301+
✅ Customer service ("Find similar historical tickets")
302+
✅ Content recommendations ("Similar articles, videos")
303+
✅ Image search ("Find similar images")
304+
305+
Characteristics:
306+
- Fragmented data (each record independent)
307+
- Requires similarity matching
308+
- No clear structure
309+
310+
Example:
311+
sqlxb.Of(&Product{}).
312+
VectorSearch("embedding", userVector, 20).
313+
Eq("category", "electronics")
314+
```
315+
316+
---
317+
318+
### Scenario 2️⃣: Structured Long Document Analysis
319+
320+
**Use PageIndex**
321+
322+
```
323+
Applicable Use Cases:
324+
✅ Financial report analysis ("How is financial stability in 2024?")
325+
✅ Legal contract retrieval ("Chapter 3 breach of contract terms")
326+
✅ Technical manual queries ("Which page contains installation steps?")
327+
✅ Academic paper reading ("Methodology section content")
328+
✅ Policy document analysis ("Specific provisions in Section 2.3")
329+
330+
Characteristics:
331+
- Long documents (50+ pages)
332+
- Clear chapter structure
333+
- Context preservation required
334+
335+
Example:
336+
sqlxb.Of(&PageIndexNode{}).
337+
Eq("doc_id", docID).
338+
Like("title", "Financial Stability").
339+
Eq("level", 1)
340+
```
341+
342+
---
343+
344+
### Scenario 3️⃣: Hybrid Retrieval (Structure + Semantics)
345+
346+
**Use PageIndex + Vector Database**
347+
348+
```
349+
Applicable Use Cases:
350+
✅ Research report Q&A ("Investment advice for tech sector")
351+
✅ Knowledge base retrieval (need both structure and semantics)
352+
✅ Medical literature analysis ("Treatment plan related chapters")
353+
✅ Patent search ("Patents with similar technical solutions")
354+
355+
Characteristics:
356+
- Both structured and semantic needs
357+
- Long documents + precise matching requirements
358+
359+
Example:
360+
// Step 1: PageIndex locates chapter
361+
sqlxb.Of(&PageIndexNode{}).
362+
Like("title", "Investment Advice").
363+
Eq("level", 2)
364+
365+
// Step 2: Vector search within chapter
366+
sqlxb.Of(&DocumentChunk{}).
367+
VectorSearch("embedding", queryVector, 10).
368+
Gte("page", chapterStartPage).
369+
Lte("page", chapterEndPage)
370+
```
371+
372+
---
373+
374+
### Scenario 4️⃣: Traditional Business Data
375+
376+
**Use Standard SQL (No Vector/PageIndex needed)**
377+
378+
```
379+
Applicable Use Cases:
380+
✅ User management ("Find users over 18")
381+
✅ Order queries ("Orders in January 2024")
382+
✅ Inventory management ("Products with low stock")
383+
✅ Statistical reports ("Sales by region")
384+
385+
Characteristics:
386+
- Structured data
387+
- Exact condition matching
388+
- No semantic understanding needed
389+
390+
Example:
391+
sqlxb.Of(&User{}).
392+
Gte("age", 18).
393+
Eq("status", "active").
394+
Paged(...)
395+
```
396+
397+
---
398+
399+
## 🤔 Quick Decision Tree
400+
401+
```
402+
Your data is...
403+
404+
├─ Fragmented (products, users, code snippets)
405+
│ └─ Need "similarity" matching?
406+
│ ├─ Yes → Vector Database ✅
407+
│ └─ No → Standard SQL ✅
408+
409+
└─ Long documents (reports, manuals, contracts)
410+
└─ Has clear chapter structure?
411+
├─ Yes → PageIndex ✅
412+
│ └─ Also need semantic matching?
413+
│ └─ Yes → PageIndex + Vector ✅
414+
└─ No → Traditional RAG (chunking + vector) ✅
415+
```
416+
417+
---
418+
419+
## 💡 Core Principles
420+
421+
```
422+
Don't debate technology choices — Look at data characteristics:
423+
424+
1️⃣ Fragmented data + need similarity
425+
→ Vector Database
426+
427+
2️⃣ Long documents + structured + need chapter location
428+
→ PageIndex
429+
430+
3️⃣ Long documents + unstructured + need semantics
431+
→ Traditional RAG (chunking + vector)
432+
433+
4️⃣ Structured data + exact matching
434+
→ Standard SQL
435+
436+
5️⃣ Complex scenarios
437+
→ Hybrid approach
438+
```
439+
440+
**sqlxb supports all scenarios — One API for everything!**
441+
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
feat: 新增 PageIndex 集成 - 结构化文档检索 (v0.10.4)
2+
3+
🚀 新增功能
4+
- ✨ PageIndex 集成支持(Vectify AI PageIndex + sqlxb)
5+
- 层级结构存储(扁平化关系表)
6+
- 使用 sqlxb 进行结构化查询
7+
- 保留文档逻辑结构
8+
9+
📚 新增文档(2个)
10+
- ✨ doc/PAGEINDEX_INTEGRATION.md - PageIndex 集成指南
11+
- PageIndex 技术介绍
12+
- 数据存储设计(JSONB vs 扁平化)
13+
- sqlxb 查询示例(标题、页码、层级)
14+
- 与 LLM 集成流程
15+
- 完整应用场景(金融报告、技术文档)
16+
- ✨ doc/USE_CASE_GUIDE_ZH.md - 使用场景决策指南(中文版)
17+
- 4 个核心场景(向量、PageIndex、混合、SQL)
18+
- 快速决策树(无需学习和理解)
19+
- 实战案例(电商、金融、技术文档)
20+
- 对比表(响应时间、准确率)
21+
22+
🔧 新增完整应用示例
23+
- ✨ examples/pageindex-app/ - PageIndex 文档结构化检索应用
24+
- 完整项目结构(8 个文件)
25+
- PageIndex JSON 导入器(递归导入)
26+
- 数据访问层(7 个查询方法)
27+
- HTTP API(5 个端点)
28+
- 数据库架构(schema.sql)
29+
- 集成测试(5 个测试)
30+
31+
🎯 核心查询功能
32+
- ✅ 按标题搜索(Like + 自动过滤)
33+
- Like("title", keyword)
34+
- ✅ 按页码定位(范围查询)
35+
- Lte("start_page", page).Gte("end_page", page)
36+
- ✅ 按层级查询(精确匹配)
37+
- Eq("level", level)
38+
- ✅ 子节点查询(父子关系)
39+
- Eq("parent_id", parentNodeID)
40+
- ✅ 页码范围查询(区间过滤)
41+
- Gte("start_page", min).Lte("end_page", max)
42+
43+
📊 示例代码统计
44+
- pageindex-app: 8 个文件(~650 行代码,5 个测试)
45+
- 总示例应用: 4 个(28 个文件,~2400 行代码,20 个测试)
46+
47+
✅ 测试
48+
- 5 个集成测试(需要 PostgreSQL)
49+
- 测试自动过滤、层级遍历、页码定位
50+
- 所有测试通过(集成测试 Skip 正常)
51+
52+
📖 文档更新
53+
- 更新 README.md(新增使用场景决策指南,中英文)
54+
- 更新 examples/README.md(新增 PageIndex 示例)
55+
- 更新 doc/README.md(新增 PAGEINDEX_INTEGRATION + USE_CASE_GUIDE_ZH)
56+
- 文档总数: 39 → 41
57+
58+
🎯 应用价值
59+
- ✅ 支持结构化文档检索(vs 传统分块)
60+
- ✅ 保留文档逻辑结构(章节、小节)
61+
- ✅ 更准确的上下文理解
62+
- ✅ 适用于金融报告、技术手册等长文档
63+
64+
💡 设计亮点
65+
- ✅ 正确使用 sqlxb API(Lte/Gte 而非 X)
66+
- ✅ 充分利用自动过滤机制
67+
- ✅ 扁平化存储便于查询
68+
- ✅ 完整的导入和查询流程
69+
70+
🚀 影响
71+
- ✅ 扩展 sqlxb 应用场景(结构化文档)
72+
- ✅ 展示 sqlxb 在非向量场景的强大能力
73+
- ✅ 提供 PageIndex + Golang 的集成方案
74+
- ✅ 为 v1.0.0 增加技术广度
75+
76+
---
77+
Co-authored-by: Claude (Anthropic)
78+
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
feat: Add PageIndex integration - Structured document retrieval (v0.10.4)
2+
3+
🚀 New Features
4+
- ✨ PageIndex integration support (Vectify AI PageIndex + sqlxb)
5+
- Hierarchical structure storage (flattened relational tables)
6+
- Structured queries using sqlxb
7+
- Preserve document logical structure
8+
9+
📚 New Documentation (2 docs)
10+
- ✨ doc/PAGEINDEX_INTEGRATION.md - PageIndex integration guide
11+
- PageIndex technology introduction
12+
- Data storage design (JSONB vs flattened)
13+
- sqlxb query examples (title, page, level)
14+
- LLM integration workflow
15+
- Complete use cases (financial reports, technical docs)
16+
- ✨ doc/USE_CASE_GUIDE_ZH.md - Use case decision guide (Chinese)
17+
- 4 core scenarios (Vector, PageIndex, Hybrid, SQL)
18+
- Quick decision tree (no learning required)
19+
- Real-world cases (e-commerce, finance, tech docs)
20+
- Comparison table (response time, accuracy)
21+
22+
🔧 New Complete Application Example
23+
- ✨ examples/pageindex-app/ - PageIndex structured document retrieval app
24+
- Complete project structure (8 files)
25+
- PageIndex JSON importer (recursive import)
26+
- Data access layer (7 query methods)
27+
- HTTP API (5 endpoints)
28+
- Database schema (schema.sql)
29+
- Integration tests (5 tests)
30+
31+
🎯 Core Query Features
32+
- ✅ Title search (Like + auto-filtering)
33+
- Like("title", keyword)
34+
- ✅ Page location (range query)
35+
- Lte("start_page", page).Gte("end_page", page)
36+
- ✅ Level query (exact match)
37+
- Eq("level", level)
38+
- ✅ Child nodes query (parent-child relationship)
39+
- Eq("parent_id", parentNodeID)
40+
- ✅ Page range query (interval filtering)
41+
- Gte("start_page", min).Lte("end_page", max)
42+
43+
📊 Example Code Stats
44+
- pageindex-app: 8 files (~650 lines, 5 tests)
45+
- Total examples: 4 apps (28 files, ~2400 lines, 20 tests)
46+
47+
✅ Testing
48+
- 5 integration tests (require PostgreSQL)
49+
- Test auto-filtering, hierarchy traversal, page location
50+
- All tests pass (integration tests skip normally)
51+
52+
📖 Documentation Updates
53+
- Update README.md (add use case decision guide, bilingual)
54+
- Update examples/README.md (add PageIndex example)
55+
- Update doc/README.md (add PAGEINDEX_INTEGRATION + USE_CASE_GUIDE_ZH)
56+
- Total docs: 39 → 41
57+
58+
🎯 Application Value
59+
- ✅ Support structured document retrieval (vs traditional chunking)
60+
- ✅ Preserve document logical structure (chapters, sections)
61+
- ✅ More accurate context understanding
62+
- ✅ Suitable for financial reports, technical manuals, etc.
63+
64+
💡 Design Highlights
65+
- ✅ Correct use of sqlxb API (Lte/Gte instead of X)
66+
- ✅ Fully leverage auto-filtering mechanism
67+
- ✅ Flattened storage for efficient queries
68+
- ✅ Complete import and query workflow
69+
70+
🚀 Impact
71+
- ✅ Extend sqlxb use cases (structured documents)
72+
- ✅ Demonstrate sqlxb's power in non-vector scenarios
73+
- ✅ Provide PageIndex + Golang integration solution
74+
- ✅ Add technical breadth for v1.0.0
75+
76+
---
77+
Co-authored-by: Claude (Anthropic)
78+

0 commit comments

Comments
 (0)