This repository provides the code and setup instructions for training and evaluating a deep learning model for solar wind classification, combining remote-sensing imagery from NASA’s SDO with in-situ plasma measurements from PSP.
The project leverages foundation model embeddings (SDO-FM) and a neural fields head for classification, as a downstream heliophysics task.
- Overview
- Features
- Prerequisites
- Quickstart — Install & Env
- Data & Storage
- Mounting SDOML (GCP / NVIDIA VMs)
- Install local packages
- Running experiments
- Configurations (Hydra)
- Long-running jobs & experiment tracking
- Dataset summary
- Caveats & Notes
- Citation
- License & Acknowledgements
This project demonstrates a proof-of-concept pipeline that bridges remote-sensing SDO imagery with in-situ PSP plasma measurements to classify solar wind structures. It uses pretrained MAE embeddings from the SDO Foundation Model (SDO-FM) and a neural-field-based head that incorporates spacecraft positional encodings and magnetic connectivity.
Goal: explore transferability of foundation-model image embeddings to heliospheric classification tasks, and provide a reproducible codebase for follow-up work.
- Uses pretrained SDO-FM MAE embeddings (AIA-based) as a foundation-model backbone.
- Neural-field based classification head that encodes spacecraft position and magnetic connectivity.
- Temporal, leakage-aware train/validation/test splits.
- Focal loss and other tools to handle class imbalance.
- Scripts for dataset preparation, fine-tuning, and hyperparameter multiruns (Hydra + PyTorch Lightning).
- W&B integration for experiment tracking.
- Linux-based workstation or cloud VM (GCP / NVIDIA VM recommended for GPUs/TPUs).
- Python 3.10+
mambaorcondapackage manager (mamba recommended).- Access to the SDOML dataset and to the SDO-FM embeddings (see Data section).
- Clone this repo:
git clone [email protected]:spaceml-org/CORONA-FIELDS.git
cd CORONA-FIELDS- Create the environment (Mamba):
mamba env create -f sw-classification.yaml
mamba activate sw-classificationImportant: this project uses large multi-channel AIA imagery aligned with PSP plasma data — almost 1M samples (10 AIA channels each). You must have access to the image archive (SDOML) or embeddings to run experiments.
From the repo root:
cd sdofm
pip install -e .
cd ../src/spp
pip install -e .
cd ../../Activate env:
mamba activate sw-classificationpython datamodule.py --config-name=finetune_solarwind_configRun in background (use screen or similar for long jobs):
python finetuning.py --config-name=finetune_solarwind_configOr run with explicit Python path:
/opt/miniforge3/envs/sw-classification/bin/python /path/to/repo/classification/scripts/finetuning/finetuning.py --config-name=finetune_solarwind_configpython finetuning.py -m --config-name=mae_random_search- All experiment parameters are driven by Hydra YAMLs in
configs/. - Copy a template from
configs/and set VM-specific paths. - If using VSCode debugging, copy
.vscode-sample/launch.json→.vscode/launch.jsonand configure"args"to point to your YAML.
Tip: use unique config filenames per user/branch to avoid merge conflicts.
- Use
screen(or tmux) for robustness when running long experiments. - We use Weights & Biases (W&B) for tracking. On first run, W&B will prompt for login or API key. Follow the prompts or set
WANDB_API_KEYas env var.
We work with almost 1 million samples. Each sample includes a 10-channel AIA image and aligned PSP plasma features.
| Split | Total | Streamer Belt | Sector Reversal | Coronal Hole | Ejecta |
|---|---|---|---|---|---|
| Train | 953,821 | 415,870 | 423,960 | 89,206 | 24,785 |
| Validation | 66,245 | 38,799 | 20,444 | 6,319 | 683 |
| Test | 13,148 | 6,235 | 3,675 | 3,102 | 136 |
Note: Test ≈ 2% (≈13k instances) — chosen as a contiguous month to avoid temporal leakage across solar rotations and to ensure strict temporal independence.
- Backbone limitations: SDO-FM was pretrained on AIA images only (coronal intensity), so it lacks HMI magnetogram pretraining. Incorporating HMI magnetograms in pretraining is a natural next step.
- Labels: The study uses the Xu & Borovsky (2015) scheme; it's widely used but coarse and threshold-based, which introduces ambiguity between classes (esp. streamer belt vs sector reversal) and affects achievable accuracy.
- Not production-ready: This is a proof-of-concept research codebase, not an operational forecasting system. It provides a reproducible starting point for future improvements.
License: MIT © 2025 .
Acknowledgements: This project builds on the SDO-FM foundation model, SDOML datasets, PSP data products, and compute resources provisioned on cloud / NVIDIA VMs.