Support client-side schema validation using Pydantic #304

tylerhutcherson · 2025-03-26T15:09:35Z

This PR implements a layered architecture for managing and validating searchable data in Redis, with clear separation of concerns between schema definition, data validation, and storage operations.

Key Components

1. Schema Definition Layer

IndexSchema provides the blueprint for data structure and constraints
- Defines fields with specific types (TEXT, TAG, NUMERIC, GEO, VECTOR)
- Supports different storage types (HASH, JSON) with appropriate configuration

2. Validation Layer

SchemaModelGenerator dynamically creates Pydantic models from schema definitions
Implements a caching mechanism to avoid redundant model generation
Maps Redis field types to appropriate Python/Pydantic types
Provides type-specific validators:
- VECTOR: validates dimensions and value ranges (e.g., INT8 range checks)
- GEO: validates geographic coordinate format
- NUMERIC: prevents boolean values

3. Storage Layer

BaseStorage is the abstract class provides the foundation for Redis operations
Specialized implementations (HashStorage, JsonStorage) for different Redis data types
Enforces schema validation during write operations when set to True
Implements optimized batch operations using Redis pipelines
Supports both synchronous and asynchronous interfaces
Handles key generation, preprocessing, and error handling

4. Index Layer

The SearchIndex contains the setting validate_on_load, which defaults on False.

Data Flow

Write Flow:

Objects are preprocessed and validated against the schema
Objects are prepared with appropriate keys
Batch writing occurs using Redis pipelines for efficiency
TTL (expiration) can be applied if specified

Read Flow:

Keys are fetched in batches using pipelines
Data is converted from Redis format to Python objects
Bytes are automatically converted to appropriate types

abrookins

This is EPIC! Looks great, too. Love the validation test coverage. I had a couple of comments, and probably the only thing I felt strongly about was using an actual JSONPath library for traversing objects.

redisvl/schema/validation.py

abrookins

Looks great! 👍

This PR implements a layered architecture for managing and validating searchable data in Redis, with clear separation of concerns between schema definition, data validation, and storage operations. - `IndexSchema` provides the blueprint for data structure and constraints - Defines fields with specific types (TEXT, TAG, NUMERIC, GEO, VECTOR) - Supports different storage types (HASH, JSON) with appropriate configuration - `SchemaModelGenerator` dynamically creates Pydantic models from schema definitions - Implements a caching mechanism to avoid redundant model generation - Maps Redis field types to appropriate Python/Pydantic types - Provides type-specific validators: - VECTOR: validates dimensions and value ranges (e.g., INT8 range checks) - GEO: validates geographic coordinate format - NUMERIC: prevents boolean values - `BaseStorage` is the abstract class provides the foundation for Redis operations - Specialized implementations (HashStorage, JsonStorage) for different Redis data types - Enforces schema validation during write operations when set to True - Implements optimized batch operations using Redis pipelines - Supports both synchronous and asynchronous interfaces - Handles key generation, preprocessing, and error handling The `SearchIndex` contains the setting `validate_on_load`, which defaults on `False`. Objects are preprocessed and validated against the schema Objects are prepared with appropriate keys Batch writing occurs using Redis pipelines for efficiency TTL (expiration) can be applied if specified Keys are fetched in batches using pipelines Data is converted from Redis format to Python objects Bytes are automatically converted to appropriate types

This PR implements a layered architecture for managing and validating searchable data in Redis, with clear separation of concerns between schema definition, data validation, and storage operations. ## Key Components ### 1. Schema Definition Layer - `IndexSchema` provides the blueprint for data structure and constraints - Defines fields with specific types (TEXT, TAG, NUMERIC, GEO, VECTOR) - Supports different storage types (HASH, JSON) with appropriate configuration ### 2. Validation Layer - `SchemaModelGenerator` dynamically creates Pydantic models from schema definitions - Implements a caching mechanism to avoid redundant model generation - Maps Redis field types to appropriate Python/Pydantic types - Provides type-specific validators: - VECTOR: validates dimensions and value ranges (e.g., INT8 range checks) - GEO: validates geographic coordinate format - NUMERIC: prevents boolean values ### 3. Storage Layer - `BaseStorage` is the abstract class provides the foundation for Redis operations - Specialized implementations (HashStorage, JsonStorage) for different Redis data types - Enforces schema validation during write operations when set to True - Implements optimized batch operations using Redis pipelines - Supports both synchronous and asynchronous interfaces - Handles key generation, preprocessing, and error handling ### 4. Index Layer The `SearchIndex` contains the setting `validate_on_load`, which defaults on `False`. ## Data Flow ### Write Flow: Objects are preprocessed and validated against the schema Objects are prepared with appropriate keys Batch writing occurs using Redis pipelines for efficiency TTL (expiration) can be applied if specified ### Read Flow: Keys are fetched in batches using pipelines Data is converted from Redis format to Python objects Bytes are automatically converted to appropriate types

tylerhutcherson requested review from abrookins and rbs333 March 26, 2025 15:09

tylerhutcherson added the enhancement New feature or request label Mar 26, 2025

tylerhutcherson marked this pull request as ready for review March 26, 2025 15:14

abrookins reviewed Mar 27, 2025

View reviewed changes

redisvl/schema/validation.py Show resolved Hide resolved

redisvl/schema/validation.py Outdated Show resolved Hide resolved

redisvl/schema/validation.py Show resolved Hide resolved

tylerhutcherson force-pushed the feat/RAAE-422-client-side-schema-validation branch from 48883db to a1920f5 Compare March 27, 2025 19:36

abrookins approved these changes Mar 27, 2025

View reviewed changes

tylerhutcherson added 11 commits March 31, 2025 10:44

very much wip

f984d62

dynamic pydantic model validation on load

fa8041a

update tests, docs, and formatting/linting

fdd70a0

Remove validation docs page

7f56857

skip cell in notebook testing

1d14f5f

update json path parser

f2f5010

hash the schema as the client side cache key

4827f61

use hf access tokens

51e6fc1

make extension classes accept vectorizer kwargs

0379db5

clean up tests a bit

b5f3780

start centralizing the use of fixtures for hugging face models

e406d76

abrookins force-pushed the feat/RAAE-422-client-side-schema-validation branch from d4f42bb to e406d76 Compare March 31, 2025 17:59

abrookins merged commit af7871c into 0.5.0 Mar 31, 2025
31 checks passed

tylerhutcherson deleted the feat/RAAE-422-client-side-schema-validation branch April 7, 2025 01:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support client-side schema validation using Pydantic #304

Support client-side schema validation using Pydantic #304

Uh oh!

tylerhutcherson commented Mar 26, 2025 •

edited

Loading

Uh oh!

abrookins left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

abrookins left a comment

Uh oh!

Uh oh!

Uh oh!

Support client-side schema validation using Pydantic #304

Support client-side schema validation using Pydantic #304

Uh oh!

Conversation

tylerhutcherson commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Components

1. Schema Definition Layer

2. Validation Layer

3. Storage Layer

4. Index Layer

Data Flow

Write Flow:

Read Flow:

Uh oh!

abrookins left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

abrookins left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tylerhutcherson commented Mar 26, 2025 •

edited

Loading