Name	Name	Last commit message	Last commit date
parent directory ..
advanced	advanced
agentic	agentic
archived	archived
basic	basic
external_provers	external_provers
intermediate	intermediate
knowledge_graphs	knowledge_graphs
logic/TDFOL	logic/TDFOL
neurosymbolic	neurosymbolic
processors	processors
tdfol	tdfol
CATALOG.md	CATALOG.md
MIGRATION_GUIDE.md	MIGRATION_GUIDE.md
README.md	README.md
README_ALERT_VISUALIZATION.md	README_ALERT_VISUALIZATION.md
README_VSCODE_CLI.md	README_VSCODE_CLI.md
_example_test_suite.py	_example_test_suite.py
create_deontological_demo.py	create_deontological_demo.py
create_visual_demonstration.py	create_visual_demonstration.py
ipld_performance_benchmark.py	ipld_performance_benchmark.py
jsonl_parquet_example.py	jsonl_parquet_example.py
jsonnet_conversion_example.py	jsonnet_conversion_example.py
requirements.txt	requirements.txt
sample_data.json	sample_data.json
streaming_data_example.py	streaming_data_example.py
test_multimedia_comprehensive.py	test_multimedia_comprehensive.py
unified_scraper_migration.py	unified_scraper_migration.py
universal_knowledge_graph_example.py	universal_knowledge_graph_example.py
validate_multimedia_simple.py	validate_multimedia_simple.py
wiki_rag_optimization.py	wiki_rag_optimization.py

IPFS Datasets Python - Examples

This directory contains examples demonstrating how to integrate ipfs_datasets_py into your applications. These examples focus on package modules (not MCP server tools) to help you understand how to use the library programmatically.

📂 Directory Structure

examples/
├── README.md                    # This file
├── MIGRATION_GUIDE.md           # Help for existing users
├── REFACTORING_SUMMARY.md       # Refactoring overview
├── requirements.txt             # Optional dependencies
│
├── basic/                       # Essential examples (01-06)
│   ├── 01_getting_started.py
│   ├── 02_embeddings_basic.py
│   ├── 03_vector_search.py
│   ├── 04_file_conversion.py
│   ├── 05_knowledge_graphs_basic.py
│   └── 06_ipfs_storage.py
│
├── intermediate/                # Intermediate examples (07-14)
│   ├── 07_pdf_processing.py
│   ├── 08_multimedia_download.py
│   ├── 09_batch_processing.py
│   ├── 10_legal_data_scraping.py
│   ├── 11_web_archiving.py
│   ├── 12_graphrag_basic.py
│   ├── 13_logic_reasoning.py
│   ├── 14_cross_document_reasoning.py
│   └── [other specialized examples]
│
├── advanced/                    # Advanced examples (15-19)
│   ├── graphrag_optimizer_example.py
│   ├── query_optimization_example.py
│   └── [future advanced examples]
│
├── archived/                    # Old/deprecated examples
│   └── [MCP server and dashboard examples]
│
├── knowledge_graphs/            # KG-specific examples
├── neurosymbolic/              # Logic reasoning examples
├── external_provers/           # Theorem prover examples
└── processors/                 # Processor-specific examples

🚀 Quick Start

Choose an example based on your needs:

🌟 Basic Examples (Start Here - `basic/`)

01_getting_started.py - Verify installation and check available modules
02_embeddings_basic.py - Generate text embeddings and measure semantic similarity
03_vector_search.py - Store embeddings and perform similarity search with FAISS/Qdrant
04_file_conversion.py - Convert various file formats (PDF, DOCX, etc.) to text
05_knowledge_graphs_basic.py - Extract entities and relationships from text
06_ipfs_storage.py - Store and retrieve data on IPFS

📚 Intermediate Examples (`intermediate/`)

07_pdf_processing.py - Advanced PDF processing with OCR
08_multimedia_download.py - Download and process media with yt-dlp and FFmpeg
09_batch_processing.py - Process multiple files in parallel
10_legal_data_scraping.py - Scrape federal/state/municipal legal datasets
11_web_archiving.py - Archive and search web content
12_graphrag_basic.py - Knowledge graph-enhanced RAG
13_logic_reasoning.py - Formal logic and theorem proving
14_cross_document_reasoning.py - Multi-document entity linking

🔬 Advanced Examples (`advanced/` - Coming Soon)

15_graphrag_optimization.py - Ontology generation and optimization
16_logic_enhanced_rag.py - RAG with logic constraints
17_legal_knowledge_base.py - Complete legal research system
18_neural_symbolic_integration.py - Combine LLMs with theorem provers
19_distributed_processing.py - P2P networking and distributed compute

📋 Prerequisites

Installation

# 1. Install the package
cd /path/to/ipfs_datasets_py
pip install -e .

# 2. Install optional dependencies for examples
cd examples
pip install -r requirements.txt

# Or install specific features only
pip install transformers torch faiss-cpu  # For basic examples

Quick Setup Profiles

Beginner (examples 01-06):

pip install transformers torch faiss-cpu beautifulsoup4 requests ipfshttpclient

Intermediate (examples 07-14):

pip install -r examples/requirements.txt

All Features:

pip install -e ".[all]"

🎯 Package Features

Core Modules

Embeddings (ml.embeddings): Generate semantic embeddings from text
Vector Stores (vector_stores): FAISS, Qdrant, IPLD-based vector storage
Knowledge Graphs (knowledge_graphs): Extract and query structured knowledge
File Conversion (processors.file_converter): Convert 20+ file formats
PDF Processing (processors.specialized.pdf): Multi-engine OCR and extraction
Multimedia (processors.multimedia): yt-dlp, FFmpeg, Discord, email processing
Logic Module (logic): Formal logic, theorem proving, neural-symbolic integration
Legal Scrapers (processors.legal_scrapers): 21K+ entity knowledge base
Web Archiving (web_archiving): Common Crawl, Brave Search, web scraping
IPFS/IPLD: Content-addressed decentralized storage

Processor Architecture

The package uses a unified processor system:

UnifiedProcessor: Auto-detects input type and routes to appropriate handler
ProcessorRegistry: Plugin-based extensibility
Protocol-based design for consistency
Lazy loading and graceful degradation

💡 Example Patterns

Basic Usage Pattern

# 1. Import the module
from ipfs_datasets_py.ml.embeddings import IPFSEmbeddings

# 2. Initialize
embedder = IPFSEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

# 3. Use the functionality
texts = ["Sample text 1", "Sample text 2"]
embeddings = await embedder.generate_embeddings(texts)

Async/Await Pattern

Most examples use asyncio for async operations:

import asyncio

async def main():
    # Your async code here
    pass

if __name__ == "__main__":
    asyncio.run(main())

Error Handling Pattern

try:
    result = await some_operation()
    if result.success:
        print(f"✅ Success: {result.data}")
    else:
        print(f"❌ Failed: {result.error}")
except Exception as e:
    print(f"❌ Error: {e}")

📖 Running the Examples

From Any Directory

# Run basic examples
python examples/basic/01_getting_started.py
python examples/basic/02_embeddings_basic.py

# Run intermediate examples
python examples/intermediate/07_pdf_processing.py
python examples/intermediate/12_graphrag_basic.py

With Environment Variables

# Enable debug logging
LOGLEVEL=DEBUG python examples/basic/02_embeddings_basic.py

# Specify HuggingFace token
HF_TOKEN=your_token python examples/basic/02_embeddings_basic.py

# Specify Brave API key (for web search examples)
BRAVE_API_KEY=your_key python examples/intermediate/11_web_archiving.py

🔧 Troubleshooting

Import Errors

If you get import errors:

# Make sure you're in the repository root
cd /path/to/ipfs_datasets_py

# Install in development mode
pip install -e .

# Or install with all dependencies
pip install -e ".[all]"

Missing Dependencies

# Check what's installed
python examples/01_getting_started.py

# Install specific features
pip install transformers torch              # For embeddings
pip install faiss-cpu                       # For vector search
pip install yt-dlp ffmpeg-python           # For multimedia

IPFS Daemon

For IPFS examples (06_ipfs_storage.py):

# Install IPFS
# See: https://docs.ipfs.tech/install/

# Initialize and start
ipfs init
ipfs daemon

# Then run the example
python examples/06_ipfs_storage.py

📂 Directory Organization

The examples directory is being reorganized for better clarity:

examples/
├── README.md                          # This file
├── 01_getting_started.py              # ✅ Installation verification
├── 02_embeddings_basic.py             # ✅ Text embeddings
├── 03_vector_search.py                # ✅ FAISS/Qdrant search
├── 04_file_conversion.py              # ✅ File format conversion
├── 05_knowledge_graphs_basic.py       # ✅ Entity extraction
├── 06_ipfs_storage.py                 # ✅ IPFS operations
├── 07_pdf_processing.py               # 🚧 Coming soon
├── 08_multimedia_download.py          # 🚧 Coming soon
├── 09_batch_processing.py             # 🚧 Coming soon
├── 10_legal_data_scraping.py          # 🚧 Coming soon
├── 11_web_archiving.py                # 🚧 Coming soon
├── 12_graphrag_basic.py               # 🚧 Coming soon
├── 13_logic_reasoning.py              # 🚧 Coming soon
├── 14_cross_document_reasoning.py     # 🚧 Coming soon
├── 15_graphrag_optimization.py        # 🚧 Coming soon
│
├── archived/                          # Old/deprecated examples
│   ├── mcp_dashboard_examples.py
│   ├── demo_mcp_server.py
│   └── ...
│
├── knowledge_graphs/                  # Specialized KG examples
│   └── simple_example.py
│
├── neurosymbolic/                     # Logic & reasoning examples
│   ├── example1_basic_reasoning.py
│   └── ...
│
└── processors/                        # Processor-specific examples
    ├── 04_ipfs_processing.py
    └── ...

🗂️ Existing Examples Reference

Many existing examples are still valuable but are being reorganized:

Legacy Examples Still Available

knowledge_graph_validation_example.py - SPARQL validation with Wikidata
pipeline_example.py - Monadic error handling and pipelines
advanced_features_example.py - Metadata extraction and batch processing
neurosymbolic/ - Logic reasoning examples (FOL, deontic, temporal)
external_provers/ - Z3 theorem prover integration

MCP Server Examples (Moving to Archived)

These focus on the MCP server rather than package integration:

demo_mcp_server.py, mcp_server_example.py
demo_mcp_dashboard.py, mcp_dashboard_examples.py
Various dashboard demos

🎓 Learning Path

Beginner (Essential Skills)

Start with 01_getting_started.py to verify setup
Learn embeddings with 02_embeddings_basic.py
Understand vector search in 03_vector_search.py
Process files with 04_file_conversion.py

Intermediate (Build Applications)

Extract knowledge with 05_knowledge_graphs_basic.py
Store data decentralized with 06_ipfs_storage.py
Process PDFs with OCR (coming soon)
Handle multimedia files (coming soon)
Batch processing at scale (coming soon)

Advanced (Production Systems)

Build GraphRAG systems (coming soon)
Integrate formal logic (coming soon)
Cross-document reasoning (coming soon)
Ontology optimization (coming soon)

🔗 Related Documentation

Main README - Project overview and installation
CLAUDE.md - Development coordination (for contributors)
API Documentation - Detailed API references
Tests - Test suite for reference implementations

🤝 Contributing Examples

Want to contribute an example? Please:

Follow the existing pattern (docstring, demos, tips)
Use async/await where appropriate
Handle errors gracefully
Include clear comments
Add to this README with proper numbering
Test thoroughly before submitting

📝 Example Template

\"\"\"
Example Title - Brief Description

Detailed description of what this example demonstrates.
Include requirements and use cases.

Requirements:
    - List dependencies here
    - pip install commands

Usage:
    python examples/XX_example_name.py
\"\"\"

import asyncio

async def demo_feature_1():
    \"\"\"Demonstrate feature 1.\"\"\"
    print("\\n" + "="*70)
    print("DEMO 1: Feature Name")
    print("="*70)
    
    try:
        # Implementation
        pass
    except Exception as e:
        print(f"❌ Error: {e}")

def show_tips():
    \"\"\"Show tips for using this feature.\"\"\"
    print("\\n" + "="*70)
    print("TIPS")
    print("="*70)
    # Add useful tips

async def main():
    \"\"\"Run all demonstrations.\"\"\"
    await demo_feature_1()
    show_tips()

if __name__ == "__main__":
    asyncio.run(main())

Last Updated: 2024-02-17
Status: 🚧 Active Refactoring - 6 new examples added, more coming soon

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

IPFS Datasets Python - Examples

📂 Directory Structure

🚀 Quick Start

🌟 Basic Examples (Start Here - `basic/`)

📚 Intermediate Examples (`intermediate/`)

🔬 Advanced Examples (`advanced/` - Coming Soon)

📋 Prerequisites

Installation

Quick Setup Profiles

🎯 Package Features

Core Modules

Processor Architecture

💡 Example Patterns

Basic Usage Pattern

Async/Await Pattern

Error Handling Pattern

📖 Running the Examples

From Any Directory

With Environment Variables

🔧 Troubleshooting

Import Errors

Missing Dependencies

IPFS Daemon

📂 Directory Organization

🗂️ Existing Examples Reference

Legacy Examples Still Available

MCP Server Examples (Moving to Archived)

🎓 Learning Path

Beginner (Essential Skills)

Intermediate (Build Applications)

Advanced (Production Systems)

🔗 Related Documentation

🤝 Contributing Examples

📝 Example Template

FilesExpand file tree

examples

Directory actions

More options

Directory actions

More options

Latest commit

History

examples

Folders and files

parent directory

README.md

IPFS Datasets Python - Examples

📂 Directory Structure

🚀 Quick Start

🌟 Basic Examples (Start Here - basic/)

📚 Intermediate Examples (intermediate/)

🔬 Advanced Examples (advanced/ - Coming Soon)

📋 Prerequisites

Installation

Quick Setup Profiles

🎯 Package Features

Core Modules

Processor Architecture

💡 Example Patterns

Basic Usage Pattern

Async/Await Pattern

Error Handling Pattern

📖 Running the Examples

From Any Directory

With Environment Variables

🔧 Troubleshooting

Import Errors

Missing Dependencies

IPFS Daemon

📂 Directory Organization

🗂️ Existing Examples Reference

Legacy Examples Still Available

MCP Server Examples (Moving to Archived)

🎓 Learning Path

Beginner (Essential Skills)

Intermediate (Build Applications)

Advanced (Production Systems)

🔗 Related Documentation

🤝 Contributing Examples

📝 Example Template

🌟 Basic Examples (Start Here - `basic/`)

📚 Intermediate Examples (`intermediate/`)

🔬 Advanced Examples (`advanced/` - Coming Soon)