🌞 CORONA-Fields: Leveraging Foundation Models for Classification of Solar Wind Phenomena

This repository provides the code and setup instructions for training and evaluating a deep learning model for solar wind classification, combining remote-sensing imagery from NASA’s SDO with in-situ plasma measurements from PSP.

The project leverages foundation model embeddings (SDO-FM) and a neural fields head for classification, as a downstream heliophysics task.

Overview

This project demonstrates a proof-of-concept pipeline that bridges remote-sensing SDO imagery with in-situ PSP plasma measurements to classify solar wind structures. It uses pretrained MAE embeddings from the SDO Foundation Model (SDO-FM) and a neural-field-based head that incorporates spacecraft positional encodings and magnetic connectivity.

Goal: explore transferability of foundation-model image embeddings to heliospheric classification tasks, and provide a reproducible codebase for follow-up work.

Features

Uses pretrained SDO-FM MAE embeddings (AIA-based) as a foundation-model backbone.
Neural-field based classification head that encodes spacecraft position and magnetic connectivity.
Temporal, leakage-aware train/validation/test splits.
Focal loss and other tools to handle class imbalance.
Scripts for dataset preparation, fine-tuning, and hyperparameter multiruns (Hydra + PyTorch Lightning).
W&B integration for experiment tracking.

Prerequisites

Linux-based workstation or cloud VM (GCP / NVIDIA VM recommended for GPUs/TPUs).
Python 3.10+
mamba or conda package manager (mamba recommended).
Access to the SDOML dataset and to the SDO-FM embeddings (see Data section).

Quickstart — Install & Env

Clone this repo:

git clone [email protected]:spaceml-org/CORONA-FIELDS.git
cd CORONA-FIELDS

Create the environment (Mamba):

mamba env create -f sw-classification.yaml
mamba activate sw-classification

Data & Storage

Important: this project uses large multi-channel AIA imagery aligned with PSP plasma data — almost 1M samples (10 AIA channels each). You must have access to the image archive (SDOML) or embeddings to run experiments.

Install local packages

From the repo root:

cd sdofm
pip install -e .
cd ../src/spp
pip install -e .
cd ../../

Running experiments

Activate env:

mamba activate sw-classification

1) Build data module

python datamodule.py --config-name=finetune_solarwind_config

2) Fine-tune model

Run in background (use screen or similar for long jobs):

python finetuning.py --config-name=finetune_solarwind_config

Or run with explicit Python path:

/opt/miniforge3/envs/sw-classification/bin/python /path/to/repo/classification/scripts/finetuning/finetuning.py --config-name=finetune_solarwind_config

3) Hyperparameter multirun

python finetuning.py -m --config-name=mae_random_search

Configurations (Hydra)

All experiment parameters are driven by Hydra YAMLs in configs/.
Copy a template from configs/ and set VM-specific paths.
If using VSCode debugging, copy .vscode-sample/launch.json → .vscode/launch.json and configure "args" to point to your YAML.

Tip: use unique config filenames per user/branch to avoid merge conflicts.

Long-running jobs & experiment tracking

Use screen (or tmux) for robustness when running long experiments.
We use Weights & Biases (W&B) for tracking. On first run, W&B will prompt for login or API key. Follow the prompts or set WANDB_API_KEY as env var.

Dataset summary

We work with almost 1 million samples. Each sample includes a 10-channel AIA image and aligned PSP plasma features.

Split	Total	Streamer Belt	Sector Reversal	Coronal Hole	Ejecta
Train	953,821	415,870	423,960	89,206	24,785
Validation	66,245	38,799	20,444	6,319	683
Test	13,148	6,235	3,675	3,102	136

Note: Test ≈ 2% (≈13k instances) — chosen as a contiguous month to avoid temporal leakage across solar rotations and to ensure strict temporal independence.

Caveats & Notes

Backbone limitations: SDO-FM was pretrained on AIA images only (coronal intensity), so it lacks HMI magnetogram pretraining. Incorporating HMI magnetograms in pretraining is a natural next step.
Labels: The study uses the Xu & Borovsky (2015) scheme; it's widely used but coarse and threshold-based, which introduces ambiguity between classes (esp. streamer belt vs sector reversal) and affects achievable accuracy.
Not production-ready: This is a proof-of-concept research codebase, not an operational forecasting system. It provides a reproducible starting point for future improvements.

License & Acknowledgements

Acknowledgements: This project builds on the SDO-FM foundation model, SDOML datasets, PSP data products, and compute resources provisioned on cloud / NVIDIA VMs.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
notebooks		notebooks
sdofm		sdofm
src		src
README.md		README.md
pyproject.toml		pyproject.toml
solar_wind.mplstyle		solar_wind.mplstyle
sw-classification.yaml		sw-classification.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌞 CORONA-Fields: Leveraging Foundation Models for Classification of Solar Wind Phenomena

Table of contents

Overview

Features

Prerequisites

Quickstart — Install & Env

Data & Storage

Install local packages

Running experiments

1) Build data module

2) Fine-tune model

3) Hyperparameter multirun

Configurations (Hydra)

Long-running jobs & experiment tracking

Dataset summary

Caveats & Notes

License & Acknowledgements

About

Uh oh!

Releases

Packages

Languages

spaceml-org/CORONA-FIELDS

Folders and files

Latest commit

History

Repository files navigation

🌞 CORONA-Fields: Leveraging Foundation Models for Classification of Solar Wind Phenomena

Table of contents

Overview

Features

Prerequisites

Quickstart — Install & Env

Data & Storage

Install local packages

Running experiments

1) Build data module

2) Fine-tune model

3) Hyperparameter multirun

Configurations (Hydra)

Long-running jobs & experiment tracking

Dataset summary

Caveats & Notes

License & Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages