Skip to content

spaceml-org/CIPHER

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

1 Commit
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ›ฐ๏ธ CIPHER

This repository contains the code and configuration used in the paper "CIPHER: Scalable Time Series Analysis for Physical Sciences with Application to Solar Wind Phenomena". The goal of this project is to apply Symbolic Aggregate approXimation (iSAX) techniques to identify and cluster patterns in solar wind time series data.


๐Ÿ“˜ Table of Contents


๐Ÿงฉ Environment Setup

This project runs on Python 3.10+ and uses mamba (or conda) as the environment manager.

# Create the environment
mamba env create -f environment.yml

# Activate the environment
mamba activate solarwind-isax

It is recommended to use Visual Studio Code (VSCode) with the following extensions:

  • Python
  • Command Variable
  • Remote Development (optional if working on remote servers)

๐Ÿ—‚๏ธ Project Structure

Ensure your project directory looks like this:

.
โ”œโ”€โ”€ .vscode/
โ”‚   โ””โ”€โ”€ launch.json
โ”œโ”€โ”€ cache/
โ”œโ”€โ”€ data/
โ”‚   โ””โ”€โ”€ catalog/
โ”œโ”€โ”€ sw-data/
โ”‚   โ””โ”€โ”€ nasaomnireader/
โ”œโ”€โ”€ environment.yml
โ””โ”€โ”€ clustering/

You can create these folders manually or using:

mkdir -p .vscode data/catalog sw-data cache

๐Ÿงฎ Cache

Add a cache folder in the root directory named cache. This is where the temporary files generated by iSAX experiments will be stored. You can change the path of the cache folder in .vscode/launch.json under the argument -cache_folder.


๐ŸŒž Data

This project works with both PSP (Parker Solar Probe) and OMNI data.

To set up your data, create a new folder inside sw-data/ for each dataset and place the corresponding files there. Example:

sw-data/
โ”œโ”€โ”€ nasaomnireader/
โ””โ”€โ”€ psp/

๐ŸŒ Creating Catalogs (First Step)

๐Ÿ’ก This process only needs to be done once, before running any experiments.

  1. Download the input data (in-situ solar wind data). You can use the NASA OMNI dataset or equivalent data compatible with the nasaomnireader.

  2. Place the data inside:

    sw-data/nasaomnireader/
    
  3. In VSCode:

    • Open the project.

    • Activate your virtual environment.

    • Install local dependencies:

      pip install -e .
  4. Configure .vscode/launch.json with your local data path. Example configuration to generate a catalog:

{
  "name": "iSAX generate catalog",
  "type": "python",
  "request": "launch",
  "console": "integratedTerminal",
  "module": "${command:extension.commandvariable.file.relativeFileDotsNoExtension}",
  "cwd": "${workspaceFolder}",
  "justMyCode": false,
  "args": [
    "-start_year", "1994",
    "-stop_year", "2023",
    "-instrument", "omni",
    "-data_path", "/absolute/path/to/your/nasaomnireader",
    "-histogram"
  ]
}
  1. In the VSCode Debug panel, select โ€œiSAX generate catalogโ€ and run it. Two CSV files will be generated inside data/catalog/.

๐Ÿงช Running Experiments

  1. Ensure the generated catalog (.csv) is referenced correctly inside your experiment script (e.g., run_isax_experiments_sf_cluster.py).

  2. Update the cache folder path in your .vscode/launch.json configuration:

    "-cache_folder", "/path/to/cache/isax_cache_experiment/"
  3. Run the experiment:

    python run_isax_experiments_sf_cluster.py

Youโ€™ll see a progress bar (tqdm) in the terminal. Depending on parameters and time range, the process may take several hours. Warnings during execution are expected and do not indicate errors.


โš™๏ธ Parallel Computing Setup

You can execute your experiments in parallel using Dask.

# Install the package locally
pip install -e .

# Start a scheduler
nohup dask-scheduler > scheduler.log 2>&1 &

# Start a worker
nohup dask-worker tcp://127.0.0.1:8786 --nworkers 8 --nthreads 1 > worker.log 2>&1 &

To access the Dask dashboard locally:

ssh -L 8787:localhost:8787 [your_server_name]

Then open http://localhost:8787/graph in your browser.


๐Ÿš€ Run Dask in a Distributed Cluster

You can further improve performance by running your code in a distributed Dask cluster. This allows multiple machines or GPUs to share the workload efficiently.

[Remote Server] Install the package as a dependency

pip install -e .

โœ… Step 1: Install Dask and Distributed

pip install "dask[distributed]"

โœ… Step 2: Start the Dask Scheduler as a Daemon

nohup dask-scheduler > scheduler.log 2>&1 &

This runs the scheduler in the background on port 8786, with a web dashboard available at port 8787.

โœ… Step 3: Start Dask Workers

nohup dask-worker tcp://127.0.0.1:8786 > worker1.log 2>&1 &

[Local Computer] Connect to the Remote Dashboard

To view Daskโ€™s dashboard from your local machine:

ssh -L 8787:localhost:8787 [server_name]
# Example:
# ssh -L 8787:localhost:8787 fdl-daniela

[Remote Server] Configure Worker Threads and Processes

nohup dask-worker tcp://127.0.0.1:8786 --nworkers 8 --nthreads 1 > worker1.log 2>&1 &

GPU-Enabled Workers (CUDA Support)

# GPU 0
CUDA_VISIBLE_DEVICES=0 dask-cuda-worker tcp://127.0.0.1:8786 --nthreads 8 > worker1.log 2>&1 &

# GPU 1
CUDA_VISIBLE_DEVICES=1 dask-cuda-worker tcp://127.0.0.1:8786 > worker2.log 2>&1 &

# All workers in one command:
dask-cuda-worker tcp://127.0.0.1:8786 --device-memory-limit=16GB

You can launch multiple workers on the same or different machines.

โœ… Step 4: Connect Your Python Script to the Cluster

from dask.distributed import Client

client = Client("tcp://127.0.0.1:8786")  # IP/hostname of your scheduler
print(client)

All Dask operations will now use the external distributed cluster.

โœ… Optional: Clean Up Background Processes

To stop all Dask processes:

pkill -f dask-scheduler
pkill -f dask-worker

๐Ÿง  Note: Whenever you switch branches, change configuration parameters, or restart your environment, restart the Dask processes as well.


๐Ÿ“š References and Acknowledgments

This work is part of Heliolab 2026 | FDL Decoding Solar Wind Challenge.

Base code inspired by:


๐Ÿชช License

This project is distributed under the MIT License. Please cite the associated paper if you use this code in your research.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published