-
Notifications
You must be signed in to change notification settings - Fork 0
feat: implement streaming file import (#170) #171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…l files (#170) Replace bulk file reading with chunked streaming processing to maintain constant memory usage regardless of file size: - Add StreamingGeoJsonReader using Utf8JsonReader for incremental parsing - Implement batched database insertion (configurable, default 1000 features) - Add streaming parsers for KML, WKT, GPX formats using XmlReader - Create ImportLimits configuration class for tunable parameters - Add IImportJobService for background processing of files >100MB - Add progress/status endpoints for monitoring import jobs - Add integration tests for streaming import functionality Technical approach: - Uses IAsyncEnumerable<IFeature> for streaming feature iteration - ArrayPool<byte> for buffer reuse to minimize allocations - Batched commits with optional transactions per batch - Task.Yield() between batches to prevent blocking - Configurable limits via appsettings.json (Import:Limits section) New endpoints: - GET /api/import/jobs - list active background jobs - GET /api/import/jobs/{jobId} - get job progress - POST /api/import/jobs/{jobId}/cancel - cancel running job - GET /api/import/limits - get current configuration
🤖 PR Template Validation✅ All checks passed! Your PR follows the template correctly. Next steps:
Automated validation powered by GitHub Actions |
- Replace GetValue calls with manual parsing to avoid Configuration.Binder dependency - Fix constructor registration for StreamingFileImportService - Restructure StreamingGeoJsonReader to avoid Utf8JsonReader across async boundaries - Update method return types for better performance (CA1859) - Remove unused WktReader field and unnecessary using directives - Fix string.Contains vs IndexOf analyzer warnings (CA2249) - Add static readonly array for KML coordinate separators (CA1861) Fixes all compilation errors making PR #171 mergeable
- Rename FileExtensions to _fileExtensions - Rename KmlCoordinateSeparators to _kmlCoordinateSeparators - Update all references to use underscore prefix Resolves IDE1006 naming rule violations
- Remove unused 'using Xunit;' directive causing IDE0005 error - Build now passes with warnings-as-errors enabled - Fixes CI build failure for streaming import functionality
mikemcdougall
added a commit
that referenced
this pull request
Dec 28, 2025
…xes (#44) (#174) * feat: implement memory-efficient streaming import for large geospatial files (#170) Replace bulk file reading with chunked streaming processing to maintain constant memory usage regardless of file size: - Add StreamingGeoJsonReader using Utf8JsonReader for incremental parsing - Implement batched database insertion (configurable, default 1000 features) - Add streaming parsers for KML, WKT, GPX formats using XmlReader - Create ImportLimits configuration class for tunable parameters - Add IImportJobService for background processing of files >100MB - Add progress/status endpoints for monitoring import jobs - Add integration tests for streaming import functionality Technical approach: - Uses IAsyncEnumerable<IFeature> for streaming feature iteration - ArrayPool<byte> for buffer reuse to minimize allocations - Batched commits with optional transactions per batch - Task.Yield() between batches to prevent blocking - Configurable limits via appsettings.json (Import:Limits section) New endpoints: - GET /api/import/jobs - list active background jobs - GET /api/import/jobs/{jobId} - get job progress - POST /api/import/jobs/{jobId}/cancel - cancel running job - GET /api/import/limits - get current configuration * fix: resolve compilation errors in streaming file import - Replace GetValue calls with manual parsing to avoid Configuration.Binder dependency - Fix constructor registration for StreamingFileImportService - Restructure StreamingGeoJsonReader to avoid Utf8JsonReader across async boundaries - Update method return types for better performance (CA1859) - Remove unused WktReader field and unnecessary using directives - Fix string.Contains vs IndexOf analyzer warnings (CA2249) - Add static readonly array for KML coordinate separators (CA1861) Fixes all compilation errors making PR #171 mergeable * fix: apply naming conventions to private static readonly fields - Rename FileExtensions to _fileExtensions - Rename KmlCoordinateSeparators to _kmlCoordinateSeparators - Update all references to use underscore prefix Resolves IDE1006 naming rule violations * fix: remove unnecessary using directive in StreamingImportTests - Remove unused 'using Xunit;' directive causing IDE0005 error - Build now passes with warnings-as-errors enabled - Fixes CI build failure for streaming import functionality * feat: implement comprehensive error handling audit and consistency fixes (#44) ## Summary Fixes GitHub issue #44 by implementing comprehensive error handling consistency across all protocols. ## Key Changes Made ### 🔧 Error Format Standardization - **MVT Tiles**: Fixed endpoints to use GeoServicesErrorHelpers instead of Results.Problem() - **OGC API Features**: Updated 8 endpoints to use ProtocolErrorWriter instead of TypedResults.Problem() - **Protocol Detection**: Enhanced ProtocolErrorWriter to detect OGC API (/collections) and MVT (/tiles) paths ### 🛡️ Global Exception Handling - **New Middleware**: Created GlobalExceptionMiddleware for centralized unhandled exception catching - **Consistent Mapping**: Maps exception types to appropriate HTTP status codes and standardized error formats - **Correlation IDs**: Logs all unhandled exceptions with correlation IDs for tracking - **Security**: Prevents sensitive information leakage in error responses ### 📋 Comprehensive Test Coverage - **GlobalExceptionMiddlewareTests**: 9 tests validating exception handling and protocol-specific responses - **MvtTileErrorHandlingTests**: 8 tests ensuring MVT endpoints return standardized GeoServices error format - **OgcFeaturesErrorHandlingTests**: 10 tests verifying OGC API Features consistency with other geospatial protocols - **ErrorHandlingConsistencyTests**: 8 architecture tests validating error handling across ALL protocols ### 🚀 Infrastructure Improvements - **Enhanced Logging**: Updated UnhandledException log to include correlation IDs - **Pipeline Integration**: Added GlobalExceptionMiddleware to middleware pipeline after CorrelationIdMiddleware ## Protocol Error Format Summary | Protocol | Error Format | Status | |----------|-------------|--------| | **FeatureServer** | GeoServices JSON | ✅ Consistent | | **OData v4** | OData-compliant JSON | ✅ Consistent | | **OGC API Features** | GeoServices JSON (NEW) | ✅ **Fixed** | | **MVT Tiles** | GeoServices JSON (NEW) | ✅ **Fixed** | ## Files Modified **Core Implementation:** - - Fixed MVT error handling - - Fixed 8 TypedResults.Problem() calls - - Added OGC/MVT path detection **New Components:** - - Centralized exception handling - - Enhanced UnhandledException logging - - Middleware pipeline registration **Test Coverage:** - (new) - (new) - (new) - (new) ## Issue Resolution ✅ **Standardized error response formats** across all protocols ✅ **Appropriate HTTP status codes** based on error types ✅ **User-friendly error messages** that avoid exposing internal system details ✅ **Field-level validation details** in validation error responses ✅ **Correlation IDs** for error logging and traceability ✅ **Comprehensive test coverage** for error scenarios 🎯 **Result**: All protocols now use consistent error handling with proper format detection, centralized exception handling, and comprehensive test validation ensuring the system meets enterprise-grade error handling standards. * feat: implement comprehensive error handling audit and consistency fixes (#44) ## Summary Fixes GitHub issue #44 by implementing comprehensive error handling consistency across all protocols. ## Key Changes Made ### 🔧 Error Format Standardization - **MVT Tiles**: Fixed endpoints to use GeoServicesErrorHelpers instead of Results.Problem() - **OGC API Features**: Updated 8 endpoints to use ProtocolErrorWriter instead of TypedResults.Problem() - **Protocol Detection**: Enhanced ProtocolErrorWriter to detect OGC API (/collections) and MVT (/tiles) paths ### 🛡️ Global Exception Handling - **New Middleware**: Created GlobalExceptionMiddleware for centralized unhandled exception catching - **Consistent Mapping**: Maps exception types to appropriate HTTP status codes and standardized error formats - **Correlation IDs**: Logs all unhandled exceptions with correlation IDs for tracking - **Security**: Prevents sensitive information leakage in error responses ### 📋 Comprehensive Test Coverage - **GlobalExceptionMiddlewareTests**: 9 tests validating exception handling and protocol-specific responses - **MvtTileErrorHandlingTests**: 8 tests ensuring MVT endpoints return standardized GeoServices error format - **OgcFeaturesErrorHandlingTests**: 10 tests verifying OGC API Features consistency with other geospatial protocols - **ErrorHandlingConsistencyTests**: 8 architecture tests validating error handling across ALL protocols ### 🚀 Infrastructure Improvements - **Enhanced Logging**: Updated UnhandledException log to include correlation IDs - **Pipeline Integration**: Added GlobalExceptionMiddleware to middleware pipeline after CorrelationIdMiddleware ## Protocol Error Format Summary | Protocol | Error Format | Status | |----------|-------------|--------| | **FeatureServer** | GeoServices JSON | ✅ Consistent | | **OData v4** | OData-compliant JSON | ✅ Consistent | | **OGC API Features** | GeoServices JSON (NEW) | ✅ **Fixed** | | **MVT Tiles** | GeoServices JSON (NEW) | ✅ **Fixed** | ## Issue Resolution ✅ **Standardized error response formats** across all protocols ✅ **Appropriate HTTP status codes** based on error types ✅ **User-friendly error messages** that avoid exposing internal system details ✅ **Field-level validation details** in validation error responses ✅ **Correlation IDs** for error logging and traceability ✅ **Comprehensive test coverage** for error scenarios 🎯 **Result**: All protocols now use consistent error handling with proper format detection, centralized exception handling, and comprehensive test validation ensuring the system meets enterprise-grade error handling standards. --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Mike McDougall <mike@honua.io>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull Request
Issue Link
Fixes #170
Summary
Replace bulk file reading with chunked streaming processing to maintain constant memory usage regardless of file size. This implementation introduces memory-efficient streaming parsers for geospatial file formats with batched database insertion and background job processing for large files.
Changes Made
Testing
scripts/pre-pr-check.sh)Coverage Impact
Breaking Changes
None
Additional Context
This implementation addresses memory issues when importing large geospatial files by processing them in streaming chunks rather than loading entire files into memory. The solution includes configurable batch sizes, automatic background job scheduling for large files, and comprehensive progress monitoring.
Pre-PR Checklist (for contributor)
scripts/pre-pr-check.shand all checks passedtype: description (#issue-number)Reviewer Checklist