Releases: unicef/giga-spatial
v0.6.7
v0.6.6
Added
-
AdminBoundaries.from_global_country_boundaries(scale="medium")
- New class method to load global admin level 0 boundaries from Natural Earth.
- Supports
"large"
(10m),"medium"
(50m), and"small"
(110m) scale options.
-
WorldPop Handler Refactor (API Integration)
- Introduced
WPPopulationHandler
,WPPopulationConfig
,WPPopulationDownloader
, andWPPopulationReader
. - Uses new
WorldPopRestClient
to dynamically query the WorldPop REST API. - Replaces static metadata files and hardcoded logic with API-based discovery and download.
- Country code lookup and dataset filtering now handled at runtime.
- Improved validation, extensibility, logging, and error handling.
- Introduced
-
POI-Based WorldPop Mapping
PoiViewGenerator.map_wp_pop()
method:- Maps WorldPop population data around POIs using flexible spatial predicates:
"centroid_within"
,"intersects"
,"fractional"
(1000m only),"within"
- Supports configurable radius and resolution (100m or 1000m).
- Aggregates population data and appends it to the view.
- Maps WorldPop population data around POIs using flexible spatial predicates:
-
Geometry-Based Zonal WorldPop Mapping
GeometryBasedZonalViewGenerator.map_wp_pop()
method:- Maps WorldPop population data to polygons/zones using:
"intersects"
or"fractional"
predicate
- Returns zonal population sums as a new view column.
- Handles predicate-dependent data loading (raster vs. GeoDataFrame).
- Maps WorldPop population data to polygons/zones using:
Changed
-
Refactored
BaseHandler.ensure_data_available
- More efficient data check and download logic.
- Downloads only missing units unless
force_download=True
. - Cleaner structure and better reuse of
get_relevant_data_units()
.
-
Refactored WorldPop Module
- Complete handler redesign using API-based architecture.
- Dataset paths and URLs are now dynamically constructed from API metadata.
- Resolution/year validation is more robust and descriptive.
- Removed static constants, gender/school_age toggles, and local CSV dependency.
Fixed
- Several small fixes and improvements to zonal aggregation methods, especially around CRS consistency, missing values, and result alignment.
v0.6.5
Added
-
MercatorTiles.get_quadkeys_from_points()
New static method for efficient 1:1 point-to-quadkey mapping using coordinate-based logic, improving performance over spatial joins. -
AdminBoundariesViewGenerator
New generator class for producing zonal views based on administrative boundaries (e.g., districts, provinces) with flexible source and admin level support. -
Zonal View Generator Enhancements
_view
: Internal attribute for accumulating mapped statistics.view
: Exposes current state of zonal view.add_variable_to_view()
: Adds mapped data frommap_points
,map_polygons
, ormap_rasters
with robust validation and zone alignment.to_dataframe()
andto_geodataframe()
methods added for exporting current view in tabular or spatial formats.
-
PoiViewGenerator
Enhancements- Consistent
_view
DataFrame for storing mapped results. _update_view()
: Central method to update POI data.save_view()
: Improved format handling (CSV, Parquet, GeoJSON, etc.) with geometry recovery.to_dataframe()
andto_geodataframe()
methods added for convenient export of enriched POI view.- Robust duplicate ID detection and CRS validation in
map_zonal_stats
.
- Consistent
-
TifProcessor
Enhancementssample_by_polygons_batched()
: Parallel polygon sampling.- Enhanced
sample_by_polygons()
with nodata masking and multiple stats. warn_on_error
: Flag to suppress sampling warnings.
-
GeoTIFF Multi-Band Support
multi
mode added for multi-band raster support.- Auto-detects band names via metadata.
- Strict validation of band count based on mode (
single
,rgb
,rgba
,multi
).
-
Spatial Distance Graph Algorithm
build_distance_graph()
added for fast KD-tree-based spatial matching.- Supports both
DataFrame
andGeoDataFrame
inputs. - Outputs a
networkx.Graph
with optional DataFrame of matches. - Handles projections, self-match exclusion, and includes verbose stats/logs.
-
Database Integration (Experimental)
- Added
DBConnection
class incore/io/database.py
for unified Trino and PostgreSQL access. - Supports schema/table introspection, query execution, and reading into
pandas
ordask
. - Handles connection creation, credential management, and diagnostics.
- Utility methods for schema/view/table/column listings and parameterized queries.
- Added
-
GHSL Population Mapping
map_ghsl_pop()
method added toGeometryBasedZonalViewGenerator
.- Aggregates GHSL population rasters to user-defined zones.
- Supports
intersects
andfractional
predicates (latter for 1000m resolution only). - Returns population statistics (e.g.,
sum
) with customizable column prefix.
Changed
-
MercatorTiles.from_points()
now internally usesget_quadkeys_from_points()
for better performance. -
map_points()
andmap_rasters()
now returnDict[zone_id, value]
to support direct usage withadd_variable_to_view()
. -
Refactored
aggregate_polygons_to_zones()
area_weighted
deprecated in favor ofpredicate
.- Supports flexible predicates like
"within"
,"fractional"
for spatial aggregation. map_polygons()
updated to reflect this change.
-
Optional Admin Boundaries Configuration
ADMIN_BOUNDARIES_DATA_DIR
is now optional.AdminBoundaries.create()
only attempts to load if explicitly configured or path is provided.- Improved documentation and fallback behavior for missing configs.
Fixed
-
GHSL Downloader
- ZIP files are now downloaded into a temporary cache directory using
requests.get()
. - Avoids unnecessary writes and ensures cleanup.
- ZIP files are now downloaded into a temporary cache directory using
-
TifProcessor
- Removed polygon sampling warnings unless explicitly enabled.
Deprecated
TifProcessor.tabular
→ useto_dataframe()
instead.TifProcessor.get_zoned_geodataframe()
→ useto_geodataframe()
instead.area_weighted
→ usepredicate
in aggregation methods instead.
v0.6.4
Added
-
GigaSchoolProfileFetcher
- New class to fetch and process school profile data from the Giga School Profile API
- Supports paginated fetching, filtering by country and school ID
- Includes methods to generate connectivity summary statistics by region, connection type, and source
-
GigaSchoolMeasurementsFetcher
- New class to fetch and process daily real-time connectivity measurements from the Giga API
- Supports filtering by date range and school
- Includes performance summary generation (download/upload speeds, latency, quality flags)
-
AdminBoundaries.from_geoboundaries
- New class method to download and process geoBoundaries data by country and admin level
- Automatically handles HDX dataset discovery, downloading, and fallback logic
-
HDXConfig.search_datasets
- Static method to search HDX datasets without full handler initialization
- Supports query string, sort order, result count, HDX site selection, and custom user agent
Fixed
- Typo in
MaxarImageDownloader
causing runtime error
Documentation
- Improved Configuration Guide (
docs/user-guide/configuration.md
)- Added comprehensive table of environment variables with defaults and descriptions
- Synced
.env_sample
andconfig.py
with docs - Example
.env
file and guidance on path overrides usingconfig.set_path
- New section on
config.ensure_directories_exist
and troubleshooting tips - Clearer handling of credentials and security notes
- Improved formatting and structure for clarity
v0.6.3
Added
-
Major refactor of
HDX
module to align with unifiedBaseHandler
architecture:HDXConfig
: fully aligned withBaseHandlerConfig
structure.- Added flexible pattern matching for resource filtering.
- Improved data unit resolution by country, geometry, and points.
- Enhanced resource filtering with exact and regex options.
-
HDXDownloader
fully aligned withBaseHandlerDownloader
:- Simplified sequential download logic.
- Improved error handling, validation, and logging.
-
HDXReader
fully aligned withBaseHandlerReader
:- Added
resolve_source_paths
andload_all_resources
methods. - Simplified source handling for single and multiple files.
- Cleaned up redundant and dataset-specific logic.
- Added
-
Introduced
HDXHandler
as unified orchestration layer using factory methods. -
Refactor of
RelativeWealthIndex (RWI)
module:- Added new
RWIHandler
class aligned withHDXHandler
andBaseHandler
. - Simplified class names:
RWIDownloader
andRWIReader
. - Enhanced configuration with
latest_only
flag to select newest resources automatically. - Simplified resource filtering and country resolution logic.
- Improved code maintainability, type hints, and error handling.
- Added new
-
New raster multi-band support in TifProcessor:
- Added new
multi
mode for handling multi-band raster datasets. - Automatic band name detection from raster metadata.
- Added strict mode validation (
single
,rgb
,rgba
,multi
). - Enhanced error handling for invalid modes and band counts.
- Added new
Fixed
- Fixed GHSL tiles loading behavior for correct coordinate system handling:
- Moved
TILES_URL
formatting and tile loading tovalidate_configuration
. - Ensures proper tile loading after CRS validation.
- Moved
Documentation
- Updated and standardized API references across documentation.
- Standardized handler method names and usage examples.
- Added building enrichment examples for POI processing.
- Updated installation instructions.
Deprecated
- Deprecated direct imports from individual handler modules.
v0.6.2
Added
- New
ROOT_DATA_DIR
configuration option to set a base directory for all data tiers- Can be configured via environment variable
ROOT_DATA_DIR
or.env
file - Defaults to current directory (
.
) if not specified - All tier data paths (bronze, silver, gold, views) are now constructed relative to this root directory
- Example: Setting
ROOT_DATA_DIR=/data/gigaspatial
will store all data under/data/gigaspatial/bronze
,/data/gigaspatial/silver
, etc.
- Can be configured via environment variable
Fixed
-
Fixed URL formatting in GHSL tiles by using Enum value instead of Enum member
- Ensures consistent URL formatting with numeric values (4326) instead of Enum names (WGS84)
- Fixes URL formatting issue across different Python environments
-
Refactored GHSL downloader to follow DataStore abstraction
- Directory creation is now handled by DataStore implementation
- Removed redundant directory creation logic from download_data_unit method
- Improves separation of concerns and makes the code more maintainable
v0.6.1
Fixed
- Handle missing GeoRepo API key gracefully in
AdminBoundaries.create()
:- Added
try-except
block aroundGeoRepoClient
initialization - Catch and log API key errors
- Fallback to GADM source if GeoRepo fails
- Enhanced logging for better traceability
- Added
This improves reliability and prevents unexpected crashes during boundary loading.
This version is published to PyPI, ready for installation via pip install gigaspatial==0.6.1
.
v0.6.0
Added
POI View Generator
map_zonal_stats
: New method for enriched spatial mapping with support for:- Raster point sampling (value at POI location)
- Raster zonal statistics (with buffer zone)
- Polygon aggregation (with optional area-weighted averaging)
- Auto-generated POI IDs in
_init_points_gdf
for consistent point tracking. - Support for area-weighted aggregation for polygon-based statistics.
BaseHandler Orchestration Layer
- New abstract
BaseHandler
class providing unified lifecycle orchestration for config, downloader, and reader. - High-level interface methods:
ensure_data_available()
load_data()
download_and_load()
get_available_data_info()
- Integrated factory pattern for safe and standardized component creation.
- Built-in context manager support for resource cleanup.
- Fully backwards compatible with existing handler architecture.
Handlers Updated to Use BaseHandler
GoogleOpenBuildingsHandler
MicrosoftBuildingsHandler
GHSLDataHandler
- All now inherit from
BaseHandler
, supporting standardized behavior and cleaner APIs.
- All now inherit from
Changed
POI View Generator
map_built_s
andmap_smod
now internally use the newmap_zonal_stats
method.tif_processors
renamed todata
to support both raster and polygon inputs.- Removed parameters:
id_column
(now handled internally)area_column
(now automatically calculated)
Internals and Usability
- Improved error handling with clearer validation messages.
- Enhanced logging for better visibility during enrichment.
- More consistent use of coordinate column naming.
- Refined type hints and parameter documentation across key methods.
Notes
- Removed legacy POI generator classes and redundant
poi.py
file. - Simplified imports and removed unused handler dependencies.
- All POI generator methods now include updated docstrings, parameter explanations, and usage examples.
- Added docs on the new
BaseHandler
interface and handler refactors.
v0.5.0
Changed
-
Refactored data loading architecture:
- Introduced dedicated reader classes for major datasets (Microsoft Global Buildings, Google Open Buildings, GHSL), each inheriting from a new
BaseHandlerReader
. - Centralized file existence checks and raster/tabular loading methods in
BaseHandlerReader
. - Improved maintainability by encapsulating dataset-specific logic inside each reader class.
- Introduced dedicated reader classes for major datasets (Microsoft Global Buildings, Google Open Buildings, GHSL), each inheriting from a new
-
Modularized source resolution:
- Each reader now supports resolving data by country, geometry, or individual points, improving code reuse and flexibility.
-
Unified POI enrichment:
- Merged all POI generators (Google Open Buildings, Microsoft Global Buildings, GHSL Built Surface, GHSL SMOD) into a single
PoiViewGenerator
class. - Supports flexible inputs: list of
(lat, lon)
tuples, list of dicts, DataFrame, or GeoDataFrame. - Maintains consistent internal state via
points_gdf
, updated after each mapping. - Enables chained enrichment of POI data using multiple datasets.
- Merged all POI generators (Google Open Buildings, Microsoft Global Buildings, GHSL Built Surface, GHSL SMOD) into a single
-
Modernized internal data access:
- All data loading now uses dedicated handler/reader classes, improving consistency and long-term maintainability.
Fixed
- Full DataStore integration:
- Fixed
OpenCellID
andHDX
handlers to fully support theDataStore
abstraction. - All file reads, writes, and checks now use the configured
DataStore
(local or cloud). - Temporary files are only used during downloads; final data is always stored and accessed via the DataStore interface.
- Fixed
Removed
- Removed deprecated POI generator classes and the now-obsolete poi submodule. All enrichment is handled through the unified
PoiViewGenerator
.
Notes
- This release finalizes the architectural refactors started in
v0.5.0b1
. - While marked stable, please report any issues or regressions from the new modular structure.
v0.5.0b1
Added
- New Handlers:
hdx.py
: Handler for downloading and managing Humanitarian Data Exchange datasets.rwi.py
: Handler for the Relative Wealth Index dataset.opencellid.py
: Handler for OpenCellID tower locations.unicef_georepo.py
: Integration with UNICEF’s GeoRepo asset repository.
- Zonal Generators:
- Introduced the
generators/zonal/
module to support spatial aggregations of various data types (points, polygons, rasters)
to zonal geometries such as grid tiles or catchment areas.
- Introduced the
- New Geo-Processing Methods:
- Added methods to compute centroids of (Multi)Polygon geometries.
- Added methods to calculate area of (Multi)Polygon geometries in square meters.
Changed
- Refactored:
config.py
: Added support for new environment variables (OpenCellID and UNICEF GeoRepo keys).geo.py
: Enhanced spatial join functions for improved performance and clarity.handlers/
:- Minor robustness improvements in
google_open_buildings
andmicrosoft_global_buildings
. - Added a new class method in
boundaries
for initializing admin boundaries from UNICEF GeoRepo.
- Minor robustness improvements in
core/io/
:- Added
list_directories
method to both ADLS and local storage backends.
- Added
- Documentation & Project Structure:
- Updated
.env_sample
and.gitignore
to align with new environment variables and data handling practices.
- Updated
Dependencies
- Updated
requirements.txt
andsetup.py
to reflect new dependencies and ensure compatibility.
Notes
- This is a pre-release (
v0.5.0b1
) and is intended for testing and feedback. - Some new modules, especially in
handlers
andgenerators
, are experimental and may be refined in upcoming releases.