🛰️ CIPHER

This repository contains the code and configuration used in the paper "CIPHER: Scalable Time Series Analysis for Physical Sciences with Application to Solar Wind Phenomena". The goal of this project is to apply Symbolic Aggregate approXimation (iSAX) techniques to identify and cluster patterns in solar wind time series data.

🧩 Environment Setup

This project runs on Python 3.10+ and uses mamba (or conda) as the environment manager.

# Create the environment
mamba env create -f environment.yml

# Activate the environment
mamba activate solarwind-isax

It is recommended to use Visual Studio Code (VSCode) with the following extensions:

Python
Command Variable
Remote Development (optional if working on remote servers)

🗂️ Project Structure

Ensure your project directory looks like this:

.
├── .vscode/
│   └── launch.json
├── cache/
├── data/
│   └── catalog/
├── sw-data/
│   └── nasaomnireader/
├── environment.yml
└── clustering/

You can create these folders manually or using:

mkdir -p .vscode data/catalog sw-data cache

🧮 Cache

Add a cache folder in the root directory named cache. This is where the temporary files generated by iSAX experiments will be stored. You can change the path of the cache folder in .vscode/launch.json under the argument -cache_folder.

🌞 Data

This project works with both PSP (Parker Solar Probe) and OMNI data.

To set up your data, create a new folder inside sw-data/ for each dataset and place the corresponding files there. Example:

sw-data/
├── nasaomnireader/
└── psp/

🌐 Creating Catalogs (First Step)

💡 This process only needs to be done once, before running any experiments.

Download the input data (in-situ solar wind data). You can use the NASA OMNI dataset or equivalent data compatible with the nasaomnireader.
Place the data inside:
```
sw-data/nasaomnireader/
```
In VSCode:
- Open the project.
- Activate your virtual environment.
- Install local dependencies:
```
pip install -e .
```
Configure .vscode/launch.json with your local data path. Example configuration to generate a catalog:

{
  "name": "iSAX generate catalog",
  "type": "python",
  "request": "launch",
  "console": "integratedTerminal",
  "module": "${command:extension.commandvariable.file.relativeFileDotsNoExtension}",
  "cwd": "${workspaceFolder}",
  "justMyCode": false,
  "args": [
    "-start_year", "1994",
    "-stop_year", "2023",
    "-instrument", "omni",
    "-data_path", "/absolute/path/to/your/nasaomnireader",
    "-histogram"
  ]
}

In the VSCode Debug panel, select “iSAX generate catalog” and run it. Two CSV files will be generated inside data/catalog/.

🧪 Running Experiments

Ensure the generated catalog (.csv) is referenced correctly inside your experiment script (e.g., run_isax_experiments_sf_cluster.py).
Update the cache folder path in your .vscode/launch.json configuration:
```
"-cache_folder", "/path/to/cache/isax_cache_experiment/"
```

Run the experiment:

python run_isax_experiments_sf_cluster.py

You’ll see a progress bar (tqdm) in the terminal. Depending on parameters and time range, the process may take several hours. Warnings during execution are expected and do not indicate errors.

⚙️ Parallel Computing Setup

You can execute your experiments in parallel using Dask.

# Install the package locally
pip install -e .

# Start a scheduler
nohup dask-scheduler > scheduler.log 2>&1 &

# Start a worker
nohup dask-worker tcp://127.0.0.1:8786 --nworkers 8 --nthreads 1 > worker.log 2>&1 &

To access the Dask dashboard locally:

ssh -L 8787:localhost:8787 [your_server_name]

Then open http://localhost:8787/graph in your browser.

🚀 Run Dask in a Distributed Cluster

You can further improve performance by running your code in a distributed Dask cluster. This allows multiple machines or GPUs to share the workload efficiently.

[Remote Server] Install the package as a dependency

pip install -e .

✅ Step 1: Install Dask and Distributed

pip install "dask[distributed]"

✅ Step 2: Start the Dask Scheduler as a Daemon

nohup dask-scheduler > scheduler.log 2>&1 &

This runs the scheduler in the background on port 8786, with a web dashboard available at port 8787.

✅ Step 3: Start Dask Workers

nohup dask-worker tcp://127.0.0.1:8786 > worker1.log 2>&1 &

[Local Computer] Connect to the Remote Dashboard

To view Dask’s dashboard from your local machine:

ssh -L 8787:localhost:8787 [server_name]
# Example:
# ssh -L 8787:localhost:8787 fdl-daniela

[Remote Server] Configure Worker Threads and Processes

nohup dask-worker tcp://127.0.0.1:8786 --nworkers 8 --nthreads 1 > worker1.log 2>&1 &

GPU-Enabled Workers (CUDA Support)

# GPU 0
CUDA_VISIBLE_DEVICES=0 dask-cuda-worker tcp://127.0.0.1:8786 --nthreads 8 > worker1.log 2>&1 &

# GPU 1
CUDA_VISIBLE_DEVICES=1 dask-cuda-worker tcp://127.0.0.1:8786 > worker2.log 2>&1 &

# All workers in one command:
dask-cuda-worker tcp://127.0.0.1:8786 --device-memory-limit=16GB

You can launch multiple workers on the same or different machines.

✅ Step 4: Connect Your Python Script to the Cluster

from dask.distributed import Client

client = Client("tcp://127.0.0.1:8786")  # IP/hostname of your scheduler
print(client)

All Dask operations will now use the external distributed cluster.

✅ Optional: Clean Up Background Processes

To stop all Dask processes:

pkill -f dask-scheduler
pkill -f dask-worker

🧠 Note: Whenever you switch branches, change configuration parameters, or restart your environment, restart the Dask processes as well.

📚 References and Acknowledgments

This work is part of Heliolab 2026 | FDL Decoding Solar Wind Challenge.

Base code inspired by:

SwRI-IDEA-Lab iSAX

🪪 License

This project is distributed under the MIT License. Please cite the associated paper if you use this code in your research.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
conf		conf
notebooks		notebooks
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🛰️ CIPHER

📘 Table of Contents

🧩 Environment Setup

🗂️ Project Structure

🧮 Cache

🌞 Data

🌐 Creating Catalogs (First Step)

🧪 Running Experiments

⚙️ Parallel Computing Setup

🚀 Run Dask in a Distributed Cluster

[Remote Server] Install the package as a dependency

✅ Step 1: Install Dask and Distributed

✅ Step 2: Start the Dask Scheduler as a Daemon

✅ Step 3: Start Dask Workers

[Local Computer] Connect to the Remote Dashboard

[Remote Server] Configure Worker Threads and Processes

GPU-Enabled Workers (CUDA Support)

✅ Step 4: Connect Your Python Script to the Cluster

✅ Optional: Clean Up Background Processes

📚 References and Acknowledgments

🪪 License

About

Uh oh!

Releases

Packages

Languages

License

spaceml-org/CIPHER

Folders and files

Latest commit

History

Repository files navigation

🛰️ CIPHER

📘 Table of Contents

🧩 Environment Setup

🗂️ Project Structure

🧮 Cache

🌞 Data

🌐 Creating Catalogs (First Step)

🧪 Running Experiments

⚙️ Parallel Computing Setup

🚀 Run Dask in a Distributed Cluster

[Remote Server] Install the package as a dependency

✅ Step 1: Install Dask and Distributed

✅ Step 2: Start the Dask Scheduler as a Daemon

✅ Step 3: Start Dask Workers

[Local Computer] Connect to the Remote Dashboard

[Remote Server] Configure Worker Threads and Processes

GPU-Enabled Workers (CUDA Support)

✅ Step 4: Connect Your Python Script to the Cluster

✅ Optional: Clean Up Background Processes

📚 References and Acknowledgments

🪪 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages