Skip to content

Conversation

@mikemcdougall
Copy link
Collaborator

@mikemcdougall mikemcdougall commented Dec 25, 2025

Pull Request

Issue Link

Fixes #170

Summary

Replace bulk file reading with chunked streaming processing to maintain constant memory usage regardless of file size. This implementation introduces memory-efficient streaming parsers for geospatial file formats with batched database insertion and background job processing for large files.

Changes Made

  • StreamingGeoJsonReader using Utf8JsonReader for incremental parsing
  • Batched database insertion (default 1000 features) with configurable limits
  • Streaming parsers for KML, WKT, and GPX formats
  • ImportLimits configuration class for tunable parameters
  • Background processing service for files exceeding 100MB threshold
  • Progress/status monitoring endpoints for import jobs
  • IAsyncEnumerable for streaming feature iteration
  • ArrayPool for buffer reuse to minimize allocations
  • Comprehensive integration tests for all streaming formats

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Architecture tests pass
  • Local pre-PR validation passed (scripts/pre-pr-check.sh)

Coverage Impact

  • Line coverage: TBD (will update after CI results)
  • Branch coverage: TBD (will update after CI results)

Breaking Changes

None

Additional Context

This implementation addresses memory issues when importing large geospatial files by processing them in streaming chunks rather than loading entire files into memory. The solution includes configurable batch sizes, automatic background job scheduling for large files, and comprehensive progress monitoring.


Pre-PR Checklist (for contributor)

  • Ran scripts/pre-pr-check.sh and all checks passed
  • Commit message follows format: type: description (#issue-number)
  • PR title matches main commit message
  • Issue number linked above
  • Tests added for new functionality
  • Documentation updated if needed

Reviewer Checklist

  • Code follows project architecture (vertical slices, no controllers)
  • Tests cover happy path and edge cases
  • No reflection in hot paths (AOT compatible)
  • Dependency limits respected (max 5 per endpoint)
  • Error handling follows project patterns
  • Security considerations addressed

…l files (#170)

Replace bulk file reading with chunked streaming processing to maintain
constant memory usage regardless of file size:

- Add StreamingGeoJsonReader using Utf8JsonReader for incremental parsing
- Implement batched database insertion (configurable, default 1000 features)
- Add streaming parsers for KML, WKT, GPX formats using XmlReader
- Create ImportLimits configuration class for tunable parameters
- Add IImportJobService for background processing of files >100MB
- Add progress/status endpoints for monitoring import jobs
- Add integration tests for streaming import functionality

Technical approach:
- Uses IAsyncEnumerable<IFeature> for streaming feature iteration
- ArrayPool<byte> for buffer reuse to minimize allocations
- Batched commits with optional transactions per batch
- Task.Yield() between batches to prevent blocking
- Configurable limits via appsettings.json (Import:Limits section)

New endpoints:
- GET /api/import/jobs - list active background jobs
- GET /api/import/jobs/{jobId} - get job progress
- POST /api/import/jobs/{jobId}/cancel - cancel running job
- GET /api/import/limits - get current configuration
@github-actions
Copy link

github-actions bot commented Dec 25, 2025

🤖 PR Template Validation

All checks passed! Your PR follows the template correctly.

Next steps:

  • Wait for CI to complete
  • Address any feedback from LLM architecture review
  • Request review from team members

Automated validation powered by GitHub Actions

mikemcdougall and others added 5 commits December 27, 2025 17:11
- Replace GetValue calls with manual parsing to avoid Configuration.Binder dependency
- Fix constructor registration for StreamingFileImportService
- Restructure StreamingGeoJsonReader to avoid Utf8JsonReader across async boundaries
- Update method return types for better performance (CA1859)
- Remove unused WktReader field and unnecessary using directives
- Fix string.Contains vs IndexOf analyzer warnings (CA2249)
- Add static readonly array for KML coordinate separators (CA1861)

Fixes all compilation errors making PR #171 mergeable
- Rename FileExtensions to _fileExtensions
- Rename KmlCoordinateSeparators to _kmlCoordinateSeparators
- Update all references to use underscore prefix

Resolves IDE1006 naming rule violations
- Remove unused 'using Xunit;' directive causing IDE0005 error
- Build now passes with warnings-as-errors enabled
- Fixes CI build failure for streaming import functionality
@mikemcdougall mikemcdougall changed the title Resolve GitHub issue #170 feat: implement streaming file import (#170) Dec 28, 2025
@mikemcdougall mikemcdougall merged commit bcb516d into trunk Dec 28, 2025
7 of 8 checks passed
mikemcdougall added a commit that referenced this pull request Dec 28, 2025
…xes (#44) (#174)

* feat: implement memory-efficient streaming import for large geospatial files (#170)

Replace bulk file reading with chunked streaming processing to maintain
constant memory usage regardless of file size:

- Add StreamingGeoJsonReader using Utf8JsonReader for incremental parsing
- Implement batched database insertion (configurable, default 1000 features)
- Add streaming parsers for KML, WKT, GPX formats using XmlReader
- Create ImportLimits configuration class for tunable parameters
- Add IImportJobService for background processing of files >100MB
- Add progress/status endpoints for monitoring import jobs
- Add integration tests for streaming import functionality

Technical approach:
- Uses IAsyncEnumerable<IFeature> for streaming feature iteration
- ArrayPool<byte> for buffer reuse to minimize allocations
- Batched commits with optional transactions per batch
- Task.Yield() between batches to prevent blocking
- Configurable limits via appsettings.json (Import:Limits section)

New endpoints:
- GET /api/import/jobs - list active background jobs
- GET /api/import/jobs/{jobId} - get job progress
- POST /api/import/jobs/{jobId}/cancel - cancel running job
- GET /api/import/limits - get current configuration

* fix: resolve compilation errors in streaming file import

- Replace GetValue calls with manual parsing to avoid Configuration.Binder dependency
- Fix constructor registration for StreamingFileImportService
- Restructure StreamingGeoJsonReader to avoid Utf8JsonReader across async boundaries
- Update method return types for better performance (CA1859)
- Remove unused WktReader field and unnecessary using directives
- Fix string.Contains vs IndexOf analyzer warnings (CA2249)
- Add static readonly array for KML coordinate separators (CA1861)

Fixes all compilation errors making PR #171 mergeable

* fix: apply naming conventions to private static readonly fields

- Rename FileExtensions to _fileExtensions
- Rename KmlCoordinateSeparators to _kmlCoordinateSeparators
- Update all references to use underscore prefix

Resolves IDE1006 naming rule violations

* fix: remove unnecessary using directive in StreamingImportTests

- Remove unused 'using Xunit;' directive causing IDE0005 error
- Build now passes with warnings-as-errors enabled
- Fixes CI build failure for streaming import functionality

* feat: implement comprehensive error handling audit and consistency fixes (#44)

## Summary
Fixes GitHub issue #44 by implementing comprehensive error handling consistency across all protocols.

## Key Changes Made

### 🔧 Error Format Standardization
- **MVT Tiles**: Fixed endpoints to use GeoServicesErrorHelpers instead of Results.Problem()
- **OGC API Features**: Updated 8 endpoints to use ProtocolErrorWriter instead of TypedResults.Problem()
- **Protocol Detection**: Enhanced ProtocolErrorWriter to detect OGC API (/collections) and MVT (/tiles) paths

### 🛡️ Global Exception Handling
- **New Middleware**: Created GlobalExceptionMiddleware for centralized unhandled exception catching
- **Consistent Mapping**: Maps exception types to appropriate HTTP status codes and standardized error formats
- **Correlation IDs**: Logs all unhandled exceptions with correlation IDs for tracking
- **Security**: Prevents sensitive information leakage in error responses

### 📋 Comprehensive Test Coverage
- **GlobalExceptionMiddlewareTests**: 9 tests validating exception handling and protocol-specific responses
- **MvtTileErrorHandlingTests**: 8 tests ensuring MVT endpoints return standardized GeoServices error format
- **OgcFeaturesErrorHandlingTests**: 10 tests verifying OGC API Features consistency with other geospatial protocols
- **ErrorHandlingConsistencyTests**: 8 architecture tests validating error handling across ALL protocols

### 🚀 Infrastructure Improvements
- **Enhanced Logging**: Updated UnhandledException log to include correlation IDs
- **Pipeline Integration**: Added GlobalExceptionMiddleware to middleware pipeline after CorrelationIdMiddleware

## Protocol Error Format Summary

| Protocol | Error Format | Status |
|----------|-------------|--------|
| **FeatureServer** | GeoServices JSON | ✅ Consistent |
| **OData v4** | OData-compliant JSON | ✅ Consistent |
| **OGC API Features** | GeoServices JSON (NEW) | ✅ **Fixed** |
| **MVT Tiles** | GeoServices JSON (NEW) | ✅ **Fixed** |

## Files Modified

**Core Implementation:**
-  - Fixed MVT error handling
-  - Fixed 8 TypedResults.Problem() calls
-  - Added OGC/MVT path detection

**New Components:**
-  - Centralized exception handling
-  - Enhanced UnhandledException logging
-  - Middleware pipeline registration

**Test Coverage:**
-  (new)
-  (new)
-  (new)
-  (new)

## Issue Resolution

✅ **Standardized error response formats** across all protocols
✅ **Appropriate HTTP status codes** based on error types
✅ **User-friendly error messages** that avoid exposing internal system details
✅ **Field-level validation details** in validation error responses
✅ **Correlation IDs** for error logging and traceability
✅ **Comprehensive test coverage** for error scenarios

🎯 **Result**: All protocols now use consistent error handling with proper format detection, centralized exception handling, and comprehensive test validation ensuring the system meets enterprise-grade error handling standards.

* feat: implement comprehensive error handling audit and consistency fixes (#44)

## Summary
Fixes GitHub issue #44 by implementing comprehensive error handling consistency across all protocols.

## Key Changes Made

### 🔧 Error Format Standardization
- **MVT Tiles**: Fixed endpoints to use GeoServicesErrorHelpers instead of Results.Problem()
- **OGC API Features**: Updated 8 endpoints to use ProtocolErrorWriter instead of TypedResults.Problem()
- **Protocol Detection**: Enhanced ProtocolErrorWriter to detect OGC API (/collections) and MVT (/tiles) paths

### 🛡️ Global Exception Handling
- **New Middleware**: Created GlobalExceptionMiddleware for centralized unhandled exception catching
- **Consistent Mapping**: Maps exception types to appropriate HTTP status codes and standardized error formats
- **Correlation IDs**: Logs all unhandled exceptions with correlation IDs for tracking
- **Security**: Prevents sensitive information leakage in error responses

### 📋 Comprehensive Test Coverage
- **GlobalExceptionMiddlewareTests**: 9 tests validating exception handling and protocol-specific responses
- **MvtTileErrorHandlingTests**: 8 tests ensuring MVT endpoints return standardized GeoServices error format
- **OgcFeaturesErrorHandlingTests**: 10 tests verifying OGC API Features consistency with other geospatial protocols
- **ErrorHandlingConsistencyTests**: 8 architecture tests validating error handling across ALL protocols

### 🚀 Infrastructure Improvements
- **Enhanced Logging**: Updated UnhandledException log to include correlation IDs
- **Pipeline Integration**: Added GlobalExceptionMiddleware to middleware pipeline after CorrelationIdMiddleware

## Protocol Error Format Summary

| Protocol | Error Format | Status |
|----------|-------------|--------|
| **FeatureServer** | GeoServices JSON | ✅ Consistent |
| **OData v4** | OData-compliant JSON | ✅ Consistent |
| **OGC API Features** | GeoServices JSON (NEW) | ✅ **Fixed** |
| **MVT Tiles** | GeoServices JSON (NEW) | ✅ **Fixed** |

## Issue Resolution

✅ **Standardized error response formats** across all protocols
✅ **Appropriate HTTP status codes** based on error types
✅ **User-friendly error messages** that avoid exposing internal system details
✅ **Field-level validation details** in validation error responses
✅ **Correlation IDs** for error logging and traceability
✅ **Comprehensive test coverage** for error scenarios

🎯 **Result**: All protocols now use consistent error handling with proper format detection, centralized exception handling, and comprehensive test validation ensuring the system meets enterprise-grade error handling standards.

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Mike McDougall <mike@honua.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement memory-efficient streaming import for large geospatial files

3 participants