Skip to content

oceanhackweek/ohw25_proj_gap

Repository files navigation

Mind the CHL Gap

Create a tutorial on gap-free Indian Ocean gridded data with CNNs. This will build on work started during GeoHackWeek 2024. We will try to get a tutorial for U-Net gap-filling working and add to https://ocean-satellite-tools.github.io/mind-the-chl-gap/intro.html. We also hope to get other algorithms working (DINCAE and DINEOF) or at least describe them.

The basic approach is the following:

graph LR
  A[netcdf/Zarr w time, lat, lon] --> G{to xarray}
  G --> C[standardized Zarr w masks and season]
  C --> D{CNN or UNet model}
  D --> E[Predict: xarray with gaps filled]
Loading

Functions are in mindthegap directory.

import mindthegap as mtg

Collaborators

Name Role
Eli Holmes Project Facilitator
Bruna Cândido Fellow
Trina Xavier Participant
Lilac Hong Participant

Planning

Background

Chlorophyll is a widely used indicator of plankton abundance, and thus a key measure of marine productivity and ecosystem health, since the ocean covers nearly 70% of Earth’s surface. Estimating chlorophyll concentrations allows researchers to assess phytoplankton biomass, which supports oceanic food webs and contributes to global carbon cycling. Remote sensing with ocean-color instruments enables large-scale monitoring of chlorophyll-a by detecting the light reflectance of plankton. However, cloud cover continues to be a significant challenge, obstructing surface observations and creating gaps in chlorophyll-a data. These gaps limit our ability to monitor marine productivity accurately and to quantify the contribution of plankton to the global carbon cycle.

Goals

Contribute to "mind-the-chl-gap" project and the create a tutorial on gap-free Indian Ocean gridded data with U-Net method. For OceanHackWeek 2025, we aimed to extend the existing work by exploring different types of CNN architectures and experimenting with alternative gap-filling tools, such as segmentation_models_pytorch, DINCAE.

Datasets

import xarray as xr
dataset = xr.open_dataset(
    "gcs://nmfs_odp_nwfsc/CB/mind_the_chl_gap/IO.zarr",
    engine="zarr",
    backend_kwargs={"storage_options": {"token": "anon"}},
    consolidated=True
)
dataset

Workflow/Roadmap

flowchart TD
    A[Zarr data] --> B[Data Preprocessing]
    B --> C[Model Fit]
    C --> D[Result Visualization]
Loading

Results/Findings

oceanhackweek.org/ohw25_proj_gap/

Lessons Learned

  • Working with outdated packages can be quite challenging.
  • Existing frameworks (e.g., DINCAE) can serve as inspiration but need to be adapted to the specific context.
  • Pay attention to memory efficiency — document how much memory is required to run your code and data.
  • Collaboration and thorough documentation help improve workflow efficiency.
  • Avoid using to_numpy() on the full dataset (time, lat, lon, var). Instead, stream patches directly from the Zarr files in batches or use dask.
  • Xarray is powerful, with advanced options available in icechunk and cubed.

References

Creating the JupyterBook

Create template in book directory

pip install -U jupyter-book
jupyter-book create book

Build and push to GitHub. Make sure you are in book dir.

jupyter-book build .
ghp-import -n -p -f _build/html

About

gap-filling for remote-sensing data

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •