Skip to content

Support client-side schema validation using Pydantic #304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Mar 31, 2025

Conversation

tylerhutcherson
Copy link
Collaborator

@tylerhutcherson tylerhutcherson commented Mar 26, 2025

This PR implements a layered architecture for managing and validating searchable data in Redis, with clear separation of concerns between schema definition, data validation, and storage operations.

Key Components

1. Schema Definition Layer

  • IndexSchema provides the blueprint for data structure and constraints
    • Defines fields with specific types (TEXT, TAG, NUMERIC, GEO, VECTOR)
    • Supports different storage types (HASH, JSON) with appropriate configuration

2. Validation Layer

  • SchemaModelGenerator dynamically creates Pydantic models from schema definitions
  • Implements a caching mechanism to avoid redundant model generation
  • Maps Redis field types to appropriate Python/Pydantic types
  • Provides type-specific validators:
    • VECTOR: validates dimensions and value ranges (e.g., INT8 range checks)
    • GEO: validates geographic coordinate format
    • NUMERIC: prevents boolean values

3. Storage Layer

  • BaseStorage is the abstract class provides the foundation for Redis operations
  • Specialized implementations (HashStorage, JsonStorage) for different Redis data types
  • Enforces schema validation during write operations when set to True
  • Implements optimized batch operations using Redis pipelines
  • Supports both synchronous and asynchronous interfaces
  • Handles key generation, preprocessing, and error handling

4. Index Layer

The SearchIndex contains the setting validate_on_load, which defaults on False.

Data Flow

Write Flow:

Objects are preprocessed and validated against the schema
Objects are prepared with appropriate keys
Batch writing occurs using Redis pipelines for efficiency
TTL (expiration) can be applied if specified

Read Flow:

Keys are fetched in batches using pipelines
Data is converted from Redis format to Python objects
Bytes are automatically converted to appropriate types

@tylerhutcherson tylerhutcherson added the enhancement New feature or request label Mar 26, 2025
@tylerhutcherson tylerhutcherson marked this pull request as ready for review March 26, 2025 15:14
Copy link
Collaborator

@abrookins abrookins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is EPIC! Looks great, too. Love the validation test coverage. I had a couple of comments, and probably the only thing I felt strongly about was using an actual JSONPath library for traversing objects.

@tylerhutcherson tylerhutcherson force-pushed the feat/RAAE-422-client-side-schema-validation branch from 48883db to a1920f5 Compare March 27, 2025 19:36
Copy link
Collaborator

@abrookins abrookins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! 👍

@abrookins abrookins force-pushed the feat/RAAE-422-client-side-schema-validation branch from d4f42bb to e406d76 Compare March 31, 2025 17:59
@abrookins abrookins merged commit af7871c into 0.5.0 Mar 31, 2025
31 checks passed
justin-cechmanek pushed a commit that referenced this pull request Apr 2, 2025
This PR implements a layered architecture for managing and validating
searchable data in Redis, with clear separation of concerns between
schema definition, data validation, and storage operations.

- `IndexSchema` provides the blueprint for data structure and
constraints
- Defines fields with specific types (TEXT, TAG, NUMERIC, GEO, VECTOR)
- Supports different storage types (HASH, JSON) with appropriate
configuration

- `SchemaModelGenerator` dynamically creates Pydantic models from schema
definitions
- Implements a caching mechanism to avoid redundant model generation
- Maps Redis field types to appropriate Python/Pydantic types
- Provides type-specific validators:
- VECTOR: validates dimensions and value ranges (e.g., INT8 range
checks)
    - GEO: validates geographic coordinate format
    - NUMERIC: prevents boolean values

- `BaseStorage` is the abstract class provides the foundation for Redis
operations
- Specialized implementations (HashStorage, JsonStorage) for different
Redis data types
- Enforces schema validation during write operations when set to True
- Implements optimized batch operations using Redis pipelines
- Supports both synchronous and asynchronous interfaces
- Handles key generation, preprocessing, and error handling

The `SearchIndex` contains the setting `validate_on_load`, which
defaults on `False`.

Objects are preprocessed and validated against the schema
Objects are prepared with appropriate keys
Batch writing occurs using Redis pipelines for efficiency
TTL (expiration) can be applied if specified

Keys are fetched in batches using pipelines
Data is converted from Redis format to Python objects
Bytes are automatically converted to appropriate types
abrookins pushed a commit that referenced this pull request Apr 4, 2025
This PR implements a layered architecture for managing and validating
searchable data in Redis, with clear separation of concerns between
schema definition, data validation, and storage operations.

## Key Components

### 1. Schema Definition Layer
- `IndexSchema` provides the blueprint for data structure and
constraints
- Defines fields with specific types (TEXT, TAG, NUMERIC, GEO, VECTOR)
- Supports different storage types (HASH, JSON) with appropriate
configuration

### 2. Validation Layer
- `SchemaModelGenerator` dynamically creates Pydantic models from schema
definitions
- Implements a caching mechanism to avoid redundant model generation
- Maps Redis field types to appropriate Python/Pydantic types
- Provides type-specific validators:
- VECTOR: validates dimensions and value ranges (e.g., INT8 range
checks)
    - GEO: validates geographic coordinate format
    - NUMERIC: prevents boolean values

### 3. Storage Layer
- `BaseStorage` is the abstract class provides the foundation for Redis
operations
- Specialized implementations (HashStorage, JsonStorage) for different
Redis data types
- Enforces schema validation during write operations when set to True
- Implements optimized batch operations using Redis pipelines
- Supports both synchronous and asynchronous interfaces
- Handles key generation, preprocessing, and error handling

### 4. Index Layer
The `SearchIndex` contains the setting `validate_on_load`, which
defaults on `False`.

## Data Flow
### Write Flow:
Objects are preprocessed and validated against the schema
Objects are prepared with appropriate keys
Batch writing occurs using Redis pipelines for efficiency
TTL (expiration) can be applied if specified

### Read Flow:
Keys are fetched in batches using pipelines
Data is converted from Redis format to Python objects
Bytes are automatically converted to appropriate types
abrookins pushed a commit that referenced this pull request Apr 4, 2025
This PR implements a layered architecture for managing and validating
searchable data in Redis, with clear separation of concerns between
schema definition, data validation, and storage operations.

## Key Components

### 1. Schema Definition Layer
- `IndexSchema` provides the blueprint for data structure and
constraints
- Defines fields with specific types (TEXT, TAG, NUMERIC, GEO, VECTOR)
- Supports different storage types (HASH, JSON) with appropriate
configuration

### 2. Validation Layer
- `SchemaModelGenerator` dynamically creates Pydantic models from schema
definitions
- Implements a caching mechanism to avoid redundant model generation
- Maps Redis field types to appropriate Python/Pydantic types
- Provides type-specific validators:
- VECTOR: validates dimensions and value ranges (e.g., INT8 range
checks)
    - GEO: validates geographic coordinate format
    - NUMERIC: prevents boolean values

### 3. Storage Layer
- `BaseStorage` is the abstract class provides the foundation for Redis
operations
- Specialized implementations (HashStorage, JsonStorage) for different
Redis data types
- Enforces schema validation during write operations when set to True
- Implements optimized batch operations using Redis pipelines
- Supports both synchronous and asynchronous interfaces
- Handles key generation, preprocessing, and error handling

### 4. Index Layer
The `SearchIndex` contains the setting `validate_on_load`, which
defaults on `False`.

## Data Flow
### Write Flow:
Objects are preprocessed and validated against the schema
Objects are prepared with appropriate keys
Batch writing occurs using Redis pipelines for efficiency
TTL (expiration) can be applied if specified

### Read Flow:
Keys are fetched in batches using pipelines
Data is converted from Redis format to Python objects
Bytes are automatically converted to appropriate types
@tylerhutcherson tylerhutcherson deleted the feat/RAAE-422-client-side-schema-validation branch April 7, 2025 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants