Create a tutorial on gap-free Indian Ocean gridded data with CNNs. This will build on work started during GeoHackWeek 2024. We will try to get a tutorial for U-Net gap-filling working and add to https://ocean-satellite-tools.github.io/mind-the-chl-gap/intro.html. We also hope to get other algorithms working (DINCAE and DINEOF) or at least describe them.
The basic approach is the following:
graph LR
A[netcdf/Zarr w time, lat, lon] --> G{to xarray}
G --> C[standardized Zarr w masks and season]
C --> D{CNN or UNet model}
D --> E[Predict: xarray with gaps filled]
Functions are in mindthegap directory.
import mindthegap as mtg
| Name | Role |
|---|---|
| Eli Holmes | Project Facilitator |
| Bruna Cândido | Fellow |
| Trina Xavier | Participant |
| Lilac Hong | Participant |
- Initial idea: Create a tutorial on gap-free Indian Ocean gridded data with U-Net method
- Pitch slide
- Slack channel: ohw25_proj_gap
- repo: https://github.com/oceanhackweek/ohw25_proj_gap
- Final presentation
Chlorophyll is a widely used indicator of plankton abundance, and thus a key measure of marine productivity and ecosystem health, since the ocean covers nearly 70% of Earth’s surface. Estimating chlorophyll concentrations allows researchers to assess phytoplankton biomass, which supports oceanic food webs and contributes to global carbon cycling. Remote sensing with ocean-color instruments enables large-scale monitoring of chlorophyll-a by detecting the light reflectance of plankton. However, cloud cover continues to be a significant challenge, obstructing surface observations and creating gaps in chlorophyll-a data. These gaps limit our ability to monitor marine productivity accurately and to quantify the contribution of plankton to the global carbon cycle.
Contribute to "mind-the-chl-gap" project and the create a tutorial on gap-free Indian Ocean gridded data with U-Net method. For OceanHackWeek 2025, we aimed to extend the existing work by exploring different types of CNN architectures and experimenting with alternative gap-filling tools, such as segmentation_models_pytorch, DINCAE.
import xarray as xr
dataset = xr.open_dataset(
"gcs://nmfs_odp_nwfsc/CB/mind_the_chl_gap/IO.zarr",
engine="zarr",
backend_kwargs={"storage_options": {"token": "anon"}},
consolidated=True
)
dataset
flowchart TD
A[Zarr data] --> B[Data Preprocessing]
B --> C[Model Fit]
C --> D[Result Visualization]
oceanhackweek.org/ohw25_proj_gap/
- Working with outdated packages can be quite challenging.
- Existing frameworks (e.g., DINCAE) can serve as inspiration but need to be adapted to the specific context.
- Pay attention to memory efficiency — document how much memory is required to run your code and data.
- Collaboration and thorough documentation help improve workflow efficiency.
- Avoid using
to_numpy()on the full dataset (time, lat, lon, var). Instead, stream patches directly from the Zarr files in batches or use dask. - Xarray is powerful, with advanced options available in icechunk and cubed.
Create template in book directory
pip install -U jupyter-book
jupyter-book create book
Build and push to GitHub. Make sure you are in book dir.
jupyter-book build .
ghp-import -n -p -f _build/html