Skip to content

[Search Improvements] Phase 1: Query Optimization #7010

@carlesarnal

Description

@carlesarnal

Parent Epic

Part of #7009 - SQL Search Functionality Improvements

Overview

Optimize existing SQL search queries without breaking changes. This phase focuses on improving performance of current search implementation.

Goals

  1. Add trigram indexes for PostgreSQL to optimize substring searches
  2. Implement unified count with window functions to reduce query count
  3. Optimize label search queries using JOINs instead of correlated subqueries
  4. Add database-specific query optimizations

Tasks

1. PostgreSQL Trigram Index Support

  • Add detection for pg_trgm extension availability
  • Create migration script to add GIN trigram indexes:
    CREATE EXTENSION IF NOT EXISTS pg_trgm;
    CREATE INDEX IDX_artifacts_name_trgm ON artifacts USING GIN (name gin_trgm_ops);
    CREATE INDEX IDX_artifacts_desc_trgm ON artifacts USING GIN (description gin_trgm_ops);
    CREATE INDEX IDX_versions_name_trgm ON versions USING GIN (name gin_trgm_ops);
    CREATE INDEX IDX_versions_desc_trgm ON versions USING GIN (description gin_trgm_ops);
  • Update SqlStatements interface with trigram support methods
  • Add configuration property: apicurio.search.trigram.enabled

2. Unified Count with Window Functions

  • Implement window function count for PostgreSQL and MSSQL:
    SELECT *, COUNT(*) OVER() as total_count
    FROM artifacts a
    WHERE ...
    ORDER BY ...
    LIMIT ? OFFSET ?
  • Keep current two-query approach for H2/MySQL fallback
  • Update result mapping to extract count from window function

3. Label Search Optimization

  • Refactor label search from EXISTS subquery to JOIN with aggregation:
    SELECT a.*, COUNT(DISTINCT l.labelKey) as matched_labels
    FROM artifacts a
    LEFT JOIN artifact_labels l ON a.groupId = l.groupId AND a.artifactId = l.artifactId
    WHERE (l.labelKey = ? AND l.labelValue = ?)
    GROUP BY a.groupId, a.artifactId
    HAVING COUNT(DISTINCT l.labelKey) >= ?
  • Apply same optimization for version_labels and group_labels
  • Benchmark performance improvement

4. Database-Specific Query Builder

  • Create SearchQueryBuilder interface for database-specific optimizations
  • Implement PostgreSQL-specific builder with trigram and window function support
  • Implement MySQL-specific builder
  • Implement MSSQL-specific builder
  • Implement H2 fallback builder

5. Testing

  • Add unit tests for new query builders
  • Add integration tests for trigram search on PostgreSQL
  • Add performance benchmarks comparing before/after
  • Test on all supported databases (PostgreSQL, MySQL, MSSQL, H2)

Files to Modify

  • app/src/main/java/io/apicurio/registry/storage/impl/sql/AbstractSqlRegistryStorage.java
  • app/src/main/java/io/apicurio/registry/storage/impl/sql/SqlStatements.java
  • app/src/main/java/io/apicurio/registry/storage/impl/sql/CommonSqlStatements.java
  • app/src/main/java/io/apicurio/registry/storage/impl/sql/PostgreSQLSqlStatements.java
  • app/src/main/resources/io/apicurio/registry/storage/impl/sql/upgrades/102/postgresql.upgrade.ddl (new)

Acceptance Criteria

  • Substring searches on PostgreSQL use trigram indexes when available
  • Single query returns both results and count (where supported)
  • Multi-label searches use optimized JOIN approach
  • No breaking changes to existing API
  • Performance improvement documented with benchmarks
  • All existing tests pass

Configuration

apicurio.search.trigram.enabled=true

Labels: enhancement, storage, search, performance

Metadata

Metadata

Assignees

No one assigned

    Projects

    Status

    Done

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions