0.5.0 final #310

abrookins · 2025-04-04T21:28:28Z

Opening a separate PR to check the rebase so I don't have to force push to our 0.5.0 branch.

- add base, router, and cache threshold optimizer - uses ranx to support eval metrics - class is designed to receive a optimizer class in the case where someone wants to supply their own without us having to make a release to support something custom

Goal: 1. Create a new DatetimeFilter or TimestampFilter - Allow querying for a specific date without time - Allow querying for a specific date with time - Allow querying for a date range - Allow querying for a time range - Allow querying with or without a timezone - Default to timezone-aware UTC datetimes 2. Alternatively, create a new Timestamp field type that allows specifying via YAML or dictionary that a numeric field is actually a timestamp, with or without a timezone. --------- Co-authored-by: Tyler Hutcherson <[email protected]>

# Add support for hybrid policy and epsilon parameters in vector searches This PR adds support for configuring hybrid search policy for vector searches and epsilon for vector range queries in RedisVL, matching the capabilities available in Redis Vector Search. ## Changes ### Added features: 1. **Hybrid Policy Control for VectorQuery** - Added `hybrid_policy` parameter with options `BATCHES` or `ADHOC_BF` - Added `batch_size` parameter for controlling batch size in `BATCHES` mode - Implemented methods to get/set these parameters 2. **EPSILON Support for VectorRangeQuery** - Added `epsilon` parameter to control range query boundaries - Implemented methods to get/set this parameter - Properly adds epsilon to query attributes, not as direct parameter ### Tests: - Added unit tests for new parameters and methods - Added integration tests that verify query construction but avoid execution with unsupported parameters ## Description Redis Vector Search allows fine-tuning of vector queries through hybrid policy selection and epsilon configuration. This PR exposes these parameters in RedisVL, giving users better control over performance and accuracy trade-offs: - **Hybrid Policy**: Controls how filters are applied during vector search: - `BATCHES`: Paginates through small batches of nearest neighbors - `ADHOC_BF`: Computes scores for all vectors passing the filter - **Epsilon**: For range queries, controls boundary expansion through `radius * (1 + epsilon)`, enabling deeper search at the expense of performance --------- Co-authored-by: Tyler Hutcherson <[email protected]>

This PR implements a layered architecture for managing and validating searchable data in Redis, with clear separation of concerns between schema definition, data validation, and storage operations. ## Key Components ### 1. Schema Definition Layer - `IndexSchema` provides the blueprint for data structure and constraints - Defines fields with specific types (TEXT, TAG, NUMERIC, GEO, VECTOR) - Supports different storage types (HASH, JSON) with appropriate configuration ### 2. Validation Layer - `SchemaModelGenerator` dynamically creates Pydantic models from schema definitions - Implements a caching mechanism to avoid redundant model generation - Maps Redis field types to appropriate Python/Pydantic types - Provides type-specific validators: - VECTOR: validates dimensions and value ranges (e.g., INT8 range checks) - GEO: validates geographic coordinate format - NUMERIC: prevents boolean values ### 3. Storage Layer - `BaseStorage` is the abstract class provides the foundation for Redis operations - Specialized implementations (HashStorage, JsonStorage) for different Redis data types - Enforces schema validation during write operations when set to True - Implements optimized batch operations using Redis pipelines - Supports both synchronous and asynchronous interfaces - Handles key generation, preprocessing, and error handling ### 4. Index Layer The `SearchIndex` contains the setting `validate_on_load`, which defaults on `False`. ## Data Flow ### Write Flow: Objects are preprocessed and validated against the schema Objects are prepared with appropriate keys Batch writing occurs using Redis pipelines for efficiency TTL (expiration) can be applied if specified ### Read Flow: Keys are fetched in batches using pipelines Data is converted from Redis format to Python objects Bytes are automatically converted to appropriate types

Run API-dependent tests once per matrix run.

This pr accomplishes 2 goals: 1. Add an option for users to easily get back a similarity value between 0 and 1 that they might expect to compare against other vector dbs. 2. Fix the current bug that `distance_threshold` is validated to be between 0 and 1 when in reality it can take values between 0 and 2. > Note: after much careful thought I believe it is best that for `0.5.0` we do **not** start enforcing all distance_thresholds between 0 and 1 and move to this option as default behavior. Ideally this metric would be consistent throughout our code and I don't love supporting this flag but I think it provides the value that is scoped for this ticket while inflicting the least amount of pain and confusion. Changes: 1. Adds the `normalize_vector_distance` flag to VectorQuery and VectorRangeQuery. Behavior: - If set to `True` it normalizes values returned from redis to a value between 0 and 1. - For cosine similarity, it applies `(2 - value)/2`. - For L2 distance, it applies normalization `(1/(1+value))`. - For IP, it does nothing and throws a warning since normalized IP is cosine by definition. - For VectorRangeQuery, if `normalize_vector_distance=True` the distance threshold is now validated to be between 0 and 1 and denormalized for execution against the database to make consistent. 2. Relaxes validation for semantic caching and routing to be between 0 and 2 fixing the current bug and aligning with how the database actually functions.

We were experiencing flakiness with a similar test before and haven't had an issue (as far as I know) with that test since updating the search_step. Error occurs for this step when after 10 tries the random search hasn't made the threshold large enough to show improvement which is why it failed sometimes.

Co-authored-by: Robert Shelton <[email protected]> Co-authored-by: Andrew Brookins <[email protected]> Co-authored-by: Tyler Hutcherson <[email protected]> Co-authored-by: Robert Shelton <[email protected]>

Co-authored-by: Tyler Hutcherson <[email protected]>

redisvl/index/index.py

Co-authored-by: Justin Cechmanek <[email protected]>

redisvl/utils/optimize/router.py

Co-authored-by: Justin Cechmanek <[email protected]>

redisvl/utils/utils.py

justin-cechmanek

It's a doozy, but it looks good

abrookins · 2025-04-04T22:20:29Z

Ok, pushed changes to the real release branch and closing this PR in favor of the other.

rbs333 and others added 11 commits April 4, 2025 14:26

Fix: Prevent RedisVL from overriding logging configurations (#293)

a729abc

Rename for test data class for pytest conflict (#302)

76bff93

Run API tests once (#306)

8a1e5f3

Run API-dependent tests once per matrix run.

Add support for full text queries and hybrid search queries (#303)

dcb34ad

Co-authored-by: Robert Shelton <[email protected]> Co-authored-by: Andrew Brookins <[email protected]> Co-authored-by: Tyler Hutcherson <[email protected]> Co-authored-by: Robert Shelton <[email protected]>

Update documentation and version for release (#308)

d7e78b7

Co-authored-by: Tyler Hutcherson <[email protected]>

justin-cechmanek reviewed Apr 4, 2025

View reviewed changes

redisvl/index/index.py Outdated Show resolved Hide resolved

Fix typo

7874913

Co-authored-by: Justin Cechmanek <[email protected]>

justin-cechmanek reviewed Apr 4, 2025

View reviewed changes

redisvl/utils/optimize/router.py Show resolved Hide resolved

justin-cechmanek reviewed Apr 4, 2025

View reviewed changes

redisvl/utils/optimize/router.py Outdated Show resolved Hide resolved

Fix typo

3344775

Co-authored-by: Justin Cechmanek <[email protected]>

justin-cechmanek reviewed Apr 4, 2025

View reviewed changes

redisvl/utils/utils.py Show resolved Hide resolved

justin-cechmanek approved these changes Apr 4, 2025

View reviewed changes

abrookins closed this Apr 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.5.0 final #310

0.5.0 final #310

abrookins commented Apr 4, 2025

justin-cechmanek left a comment

abrookins commented Apr 4, 2025

0.5.0 final #310

0.5.0 final #310

Conversation

abrookins commented Apr 4, 2025

justin-cechmanek left a comment

Choose a reason for hiding this comment

abrookins commented Apr 4, 2025