Skip to content

spaceml-org/CORONA-FIELDS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🌞 CORONA-Fields: Leveraging Foundation Models for Classification of Solar Wind Phenomena

License: MIT Python Status

This repository provides the code and setup instructions for training and evaluating a deep learning model for solar wind classification, combining remote-sensing imagery from NASA’s SDO with in-situ plasma measurements from PSP.

The project leverages foundation model embeddings (SDO-FM) and a neural fields head for classification, as a downstream heliophysics task.


Table of contents


Overview

This project demonstrates a proof-of-concept pipeline that bridges remote-sensing SDO imagery with in-situ PSP plasma measurements to classify solar wind structures. It uses pretrained MAE embeddings from the SDO Foundation Model (SDO-FM) and a neural-field-based head that incorporates spacecraft positional encodings and magnetic connectivity.

Goal: explore transferability of foundation-model image embeddings to heliospheric classification tasks, and provide a reproducible codebase for follow-up work.


Features

  • Uses pretrained SDO-FM MAE embeddings (AIA-based) as a foundation-model backbone.
  • Neural-field based classification head that encodes spacecraft position and magnetic connectivity.
  • Temporal, leakage-aware train/validation/test splits.
  • Focal loss and other tools to handle class imbalance.
  • Scripts for dataset preparation, fine-tuning, and hyperparameter multiruns (Hydra + PyTorch Lightning).
  • W&B integration for experiment tracking.

Prerequisites

  • Linux-based workstation or cloud VM (GCP / NVIDIA VM recommended for GPUs/TPUs).
  • Python 3.10+
  • mamba or conda package manager (mamba recommended).
  • Access to the SDOML dataset and to the SDO-FM embeddings (see Data section).

Quickstart — Install & Env

  1. Clone this repo:
git clone [email protected]:spaceml-org/CORONA-FIELDS.git
cd CORONA-FIELDS
  1. Create the environment (Mamba):
mamba env create -f sw-classification.yaml
mamba activate sw-classification

Data & Storage

Important: this project uses large multi-channel AIA imagery aligned with PSP plasma data — almost 1M samples (10 AIA channels each). You must have access to the image archive (SDOML) or embeddings to run experiments.


Install local packages

From the repo root:

cd sdofm
pip install -e .
cd ../src/spp
pip install -e .
cd ../../

Running experiments

Activate env:

mamba activate sw-classification

1) Build data module

python datamodule.py --config-name=finetune_solarwind_config

2) Fine-tune model

Run in background (use screen or similar for long jobs):

python finetuning.py --config-name=finetune_solarwind_config

Or run with explicit Python path:

/opt/miniforge3/envs/sw-classification/bin/python /path/to/repo/classification/scripts/finetuning/finetuning.py --config-name=finetune_solarwind_config

3) Hyperparameter multirun

python finetuning.py -m --config-name=mae_random_search

Configurations (Hydra)

  • All experiment parameters are driven by Hydra YAMLs in configs/.
  • Copy a template from configs/ and set VM-specific paths.
  • If using VSCode debugging, copy .vscode-sample/launch.json.vscode/launch.json and configure "args" to point to your YAML.

Tip: use unique config filenames per user/branch to avoid merge conflicts.


Long-running jobs & experiment tracking

  • Use screen (or tmux) for robustness when running long experiments.
  • We use Weights & Biases (W&B) for tracking. On first run, W&B will prompt for login or API key. Follow the prompts or set WANDB_API_KEY as env var.

Dataset summary

We work with almost 1 million samples. Each sample includes a 10-channel AIA image and aligned PSP plasma features.

Split Total Streamer Belt Sector Reversal Coronal Hole Ejecta
Train 953,821 415,870 423,960 89,206 24,785
Validation 66,245 38,799 20,444 6,319 683
Test 13,148 6,235 3,675 3,102 136

Note: Test ≈ 2% (≈13k instances) — chosen as a contiguous month to avoid temporal leakage across solar rotations and to ensure strict temporal independence.


Caveats & Notes

  • Backbone limitations: SDO-FM was pretrained on AIA images only (coronal intensity), so it lacks HMI magnetogram pretraining. Incorporating HMI magnetograms in pretraining is a natural next step.
  • Labels: The study uses the Xu & Borovsky (2015) scheme; it's widely used but coarse and threshold-based, which introduces ambiguity between classes (esp. streamer belt vs sector reversal) and affects achievable accuracy.
  • Not production-ready: This is a proof-of-concept research codebase, not an operational forecasting system. It provides a reproducible starting point for future improvements.

License & Acknowledgements

License: MIT © 2025 .

Acknowledgements: This project builds on the SDO-FM foundation model, SDOML datasets, PSP data products, and compute resources provisioned on cloud / NVIDIA VMs.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published