Skip to content

[Search Improvements] Phase 4: Performance & Scalability #7013

@carlesarnal

Description

@carlesarnal

Parent Epic

Part of #7009 - SQL Search Functionality Improvements

Overview

Improve search performance and scalability through caching, optimized pagination, and async search capabilities for large datasets.

Goals

  1. Implement search result caching
  2. Add cursor-based pagination for efficient deep pagination
  3. Implement async search for large result sets
  4. Performance testing and tuning
  5. Documentation and monitoring

Tasks

1. Search Result Caching

  • Add caching infrastructure for search queries:
    @ConfigProperty(name = "apicurio.search.cache.enabled", defaultValue = "false")
    boolean searchCacheEnabled;
    
    @ConfigProperty(name = "apicurio.search.cache.ttl-seconds", defaultValue = "60")
    int searchCacheTtlSeconds;
    
    @ConfigProperty(name = "apicurio.search.cache.max-entries", defaultValue = "1000")
    int searchCacheMaxEntries;
  • Implement cache key generation from search parameters
  • Add cache invalidation on data changes (artifact create/update/delete)
  • Use existing Quarkus caching infrastructure (Caffeine)
  • Add cache hit/miss metrics
  • Implement cache warm-up for common queries (optional)

2. Cursor-Based Pagination

  • Create SearchCursor class:
    public class SearchCursor {
        private String lastGroupId;
        private String lastArtifactId;
        private Object lastSortValue;
        
        public String encode() { /* Base64 encode */ }
        public static SearchCursor decode(String cursor) { /* Decode */ }
    }
  • Implement keyset/seek pagination:
    -- Instead of OFFSET/LIMIT
    SELECT * FROM artifacts a
    WHERE (a.name, a.groupId, a.artifactId) > (?, ?, ?)
    ORDER BY a.name, a.groupId, a.artifactId
    LIMIT ?
  • Add cursor and nextCursor to search results:
    public class ArtifactSearchResultsDto {
        // Existing
        private List<SearchedArtifactDto> artifacts;
        private Integer count;
        // New
        private String nextCursor;
        private String prevCursor;
    }
  • Support both offset-based and cursor-based pagination (backward compatible)
  • Add REST API parameter for cursor

3. Async Search for Large Datasets

  • Create async search infrastructure:
    public class SearchJob {
        private String jobId;
        private SearchJobStatus status;
        private Instant createdOn;
        private Instant completedOn;
        private Integer totalResults;
    }
    
    public enum SearchJobStatus {
        PENDING, RUNNING, COMPLETED, FAILED, EXPIRED
    }
  • Implement async search endpoints:
    @POST
    @Path("/search/artifacts/async")
    public SearchJob startAsyncSearch(SearchRequest request);
    
    @GET
    @Path("/search/jobs/{jobId}")
    public SearchJobStatus getSearchStatus(@PathParam("jobId") String jobId);
    
    @GET
    @Path("/search/jobs/{jobId}/results")
    public ArtifactSearchResults getSearchResults(
        @PathParam("jobId") String jobId,
        @QueryParam("offset") int offset,
        @QueryParam("limit") int limit
    );
  • Store async search results temporarily (configurable TTL)
  • Implement job cleanup for expired/completed jobs
  • Add progress tracking for long-running searches

4. Query Optimization

  • Implement query plan analysis and logging:
    @ConfigProperty(name = "apicurio.search.explain.enabled", defaultValue = "false")
    boolean explainEnabled;
  • Add slow query detection and logging
  • Implement query complexity limits to prevent resource exhaustion
  • Add database connection pool tuning recommendations
  • Optimize N+1 queries for label fetching

5. Performance Testing

  • Create performance test suite:
    • 10,000 artifacts search benchmark
    • 100,000 artifacts search benchmark
    • Concurrent search load testing
    • Deep pagination performance tests
    • Full-text search performance comparison
  • Establish performance baselines
  • Document performance characteristics per database
  • Create performance regression tests for CI

6. Monitoring & Observability

  • Add search-specific metrics:
    apicurio_search_requests_total
    apicurio_search_duration_seconds
    apicurio_search_cache_hits_total
    apicurio_search_cache_misses_total
    apicurio_search_results_count
    apicurio_search_slow_queries_total
    
  • Add search query logging (configurable)
  • Integrate with existing Micrometer metrics
  • Add Grafana dashboard template for search metrics

7. Documentation

  • Document search performance best practices
  • Add capacity planning guidelines
  • Document database-specific tuning recommendations
  • Update configuration reference
  • Add troubleshooting guide for slow searches

Files to Modify

  • app/src/main/java/io/apicurio/registry/storage/impl/sql/AbstractSqlRegistryStorage.java
  • app/src/main/java/io/apicurio/registry/storage/dto/ArtifactSearchResultsDto.java
  • app/src/main/java/io/apicurio/registry/rest/v3/impl/SearchResourceImpl.java
  • app/src/main/java/io/apicurio/registry/rest/v3/SearchResource.java
  • app/src/main/resources/application.properties
  • common/src/main/resources/META-INF/openapi.json

New Files

  • app/src/main/java/io/apicurio/registry/storage/search/SearchCache.java
  • app/src/main/java/io/apicurio/registry/storage/search/SearchCursor.java
  • app/src/main/java/io/apicurio/registry/storage/search/SearchJob.java
  • app/src/main/java/io/apicurio/registry/storage/search/SearchJobManager.java
  • app/src/main/java/io/apicurio/registry/metrics/SearchMetrics.java
  • integration-tests/src/test/java/io/apicurio/tests/performance/SearchPerformanceIT.java

Acceptance Criteria

  • Search caching reduces database load for repeated queries
  • Cursor-based pagination performs consistently regardless of offset
  • Async search available for queries that may take >30 seconds
  • Performance benchmarks documented
  • Slow query detection and logging operational
  • Search metrics available in Prometheus format
  • Documentation complete

Configuration

# Caching
apicurio.search.cache.enabled=false
apicurio.search.cache.ttl-seconds=60
apicurio.search.cache.max-entries=1000

# Pagination
apicurio.search.max-results=1000
apicurio.search.default-limit=20
apicurio.search.cursor.enabled=true

# Async search
apicurio.search.async.enabled=true
apicurio.search.async.threshold-ms=5000
apicurio.search.async.result-ttl-minutes=30

# Performance
apicurio.search.slow-query-threshold-ms=1000
apicurio.search.explain.enabled=false
apicurio.search.max-complexity=100

Performance Targets

Scenario Target
Simple search (10k artifacts) < 100ms
Full-text search (10k artifacts) < 200ms
Faceted search (10k artifacts) < 300ms
Deep pagination (page 1000) < 200ms with cursor
Concurrent searches (100 req/s) < 500ms p99

API Changes

Cursor Pagination

GET /search/artifacts?cursor=eyJsYXN0R3JvdXBJZCI6...&limit=20

Response:
{
  "artifacts": [...],
  "count": 5000,
  "nextCursor": "eyJsYXN0R3JvdXBJZCI6...",
  "prevCursor": "eyJsYXN0R3JvdXBJZCI6..."
}

Async Search

POST /search/artifacts/async
{
  "filters": {...},
  "orderBy": "name",
  "limit": 10000
}

Response:
{
  "jobId": "abc123",
  "status": "PENDING",
  "createdOn": "2024-01-15T10:30:00Z"
}

GET /search/jobs/abc123

Response:
{
  "jobId": "abc123",
  "status": "COMPLETED",
  "totalResults": 8542,
  "completedOn": "2024-01-15T10:30:05Z"
}

GET /search/jobs/abc123/results?offset=0&limit=100

Labels: enhancement, storage, search, performance, scalability

Metadata

Metadata

Assignees

No one assigned

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions