UNICEF-Giga: School Connectivity Prediction with Geospatial Data and Machine Learning

Description | Dataset | Data Processing | Code Organization

📄 Description

This work presents the data processing, model training, testing, and analysis for the purposes of school mapping and school connectivity prediction utilizing Earth Observation data.

Obtaining complete and accurate information on schools locations is a critical first step to accelerating digital connectivity and driving progress towards SDG4: Quality Education. However, precise GPS coordinate of schools are often inaccurate, incomplete, or even completely non-existent in many developing countries. In support of the Giga initiative, we leverage machine learning and a combination of remote sensing and auxillary data to accelerate school mapping. We also investigate the ability of geospatial information to be used for predicting connectivity status of schools.

This work aims to support government agencies and connectivity providers in improving school location data to better estimate the costs of digitally connecting schools and plan the strategic allocation of their financial resources.

🌍 Dataset

The multi-modal satellite and ground-based data was curated from open-access data, available from Google Earth Engine, Ookla, and The World Bank. The list of datasets used to generate the model feature space are below:

MODIS Landcover

Global Human Settlement Layer

Global Human Modification

VIIRS Nighttime Lights

Ookla Speedtest

World Bank Electrical Power Grid

This work also explores the use of location encoder-extracted feature embeddings from various Clip based models, inclucding:

SatCLIP

GeoCLIP

CSP

⚙️ Data Processing

Prior to generating features, the coordinates of the school and non-school samples were extracted from the AOI_train.geojson file provided by UNICEF with the get_lat_lon_list_from_gdp function in the processing_scripts.py script.

To generate the tabular features extracted from Google Earth Engine data, the airPy package was used with the following command:

python run_airpy.py --gee_data <QUERIED DATA> --band <QUERIED DATA BAND> --region <COORDINATES OF SCHOOLS/NON-SCHOOLS> --date <DATE> --analysis_type <COLLECTION> --buffer_size <BUFFER_SIZE> --configs_dir <DIRECTORY TO SAVE CONFIGS> --save_dir <DIRECTORY TO SAVE TABULAR FEATURES> --add_time no --save_type <CSV>

Distance to electrical transmission line and ookla speedtest data features were calculated in the get_elec and get_ookla functions in the generate_features.py script.

ML-ready features are generated with the generate_features.py script with the following command:

python generate_features.py --root_dir --save_dir --aoi --buffer --target

Where the configurable parameters refer to:

--root_dir: Directory path where data is stored
--save_dir: Directory path to save generated features
--aoi: Country/Region of interest
--buffer: Buffer extent surrounding target
--target: ML model target type. Must be one of school or connectivity

📚 Code Organization

To run the pipeline, the following command is used:

python run_pipeline.py --model <MODEL> --aoi <COUNTRY> --buffer <BUFFER_EXTENT> --root_dir <DIRECTORY OF DATA> --experiment_type <ONLINE/OFFLINE> --features <FEATURES_SPACE> --parameter_tuning <TRUE/FALSE> --target <SCHOOL/CONNECTIVITY> --data_split <PERCENTAGE OR SPATIAL CV>

The available configurable parameters are:

--model: Model
- rf: random forest
- gb: gradient boosted
- mlp: multi-layer perceptron
- svm: support vector machine
- lr: logistic regression
- xgb: extreme gradient boosting
--aoi: Country
--buffer: Buffer extent surrounding target
--root_dir: Directory of data
--experiment_type: Wandb experiment type. Online or Offline to save and push run directly to Wandb project.
--features: Feature space to use to train/test model
--parameter_tuning: Specify if you would like to hyperparamter tune model
--target: Model target. School or Connectivity.
--data_split: Specify if percentage split of data (ie 70/30 train/test) or spatial cross validation.

The below folders host the following code:

data_processing: all pre-processing scripts to generate tabular feature space.

classifiers: each ML classifier used.

analysis: scripts for post-processing results into figures and maps.

Name		Name	Last commit message	Last commit date
Latest commit History 160 Commits
.idea		.idea
Notebooks		Notebooks
analysis		analysis
classifiers		classifiers
data		data
data_processing		data_processing
docs		docs
wandb		wandb
.DS_Store		.DS_Store
README.md		README.md
bulk_run_embeddings_RWA.sh		bulk_run_embeddings_RWA.sh
new_school_prediction.py		new_school_prediction.py
new_school_scrap.py		new_school_scrap.py
requirements.txt		requirements.txt
run_pipeline.py		run_pipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UNICEF-Giga: School Connectivity Prediction with Geospatial Data and Machine Learning

📄 Description

🌍 Dataset

⚙️ Data Processing

📚 Code Organization

About

Releases

Packages

Languages

kelsdoerksen/giga-connectivity

Folders and files

Latest commit

History

Repository files navigation

UNICEF-Giga: School Connectivity Prediction with Geospatial Data and Machine Learning

📄 Description

🌍 Dataset

⚙️ Data Processing

📚 Code Organization

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages