AI4Pain Rust Implementation

High-Performance Entropy Feature Extraction for Pain Assessment

Production-ready Rust implementation delivering 201× speedup over Python with long-format output aligned with Jupyter notebook analysis pipeline. Maintains 100% numerical validation with Python/ordpy implementation.

Overview

This is a direct algorithmic translation of the Python implementation to Rust, preserving all mathematical operations while leveraging Rust's zero-cost abstractions and fearless concurrency:

201× faster: Processes signals at 4,000+ rows/second (7.3s vs. 24.5 minutes for train/Bvp)
Long-format output: 15 rows per signal, 16 columns per row (notebook-aligned)
100% validated: Numerical agreement with Python/ordpy within CSV precision (<1e-6)
16× less memory: 180 MB vs. 3.2 GB peak usage
Multi-core parallelization: Automatic via Rayon (12-core CPU utilized)
Granular file organization: Separate CSV per dataset × signal_type combination
Bug-fixed: Corrected Renyi/Tsallis ordering (validated against Python)

Performance

Hardware: MacBook Pro M1, 8 cores, 16 GB RAM

Metric	Python	Rust	Speedup
Total runtime (train/Bvp)	1,467s (24.5 min)	7.33s	201×
Rows/second	20.1	4,027	200×
Peak memory	2-3 GB	180 MB	16× reduction
Numerical accuracy	Reference	100% match	N/A

Test dataset: 29,520 rows (1,968 signals × 15 rows each), 16 columns per row, 8 entropy measures per row

Quick Start

Installation

Prerequisites: Install Rust toolchain (rustup)

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env

Build optimized binary:

cd ai4pain-rust
cargo build --release

# Binary created at: target/release/ai4pain

Verify installation:

./target/release/ai4pain --version
# ai4pain 2.0.0

Data Organization

Same structure as Python version:

data/
├── train/
│   ├── Bvp/*.csv
│   ├── Eda/*.csv
│   ├── Resp/*.csv
│   └── SpO2/*.csv
├── validation/
│   └── [same structure]
└── test/
    └── [same structure]

CSV format: Each column = one participant trial, rows = time samples

Note: The data/test/, data/train/, and data/validation/ directories are preserved as placeholders in the repository via .gitkeep files. All data files within these directories are gitignored for data integrity.

Auto-Generated Folders

The following folders are created automatically during execution and are gitignored:

results/ - Contains output CSV files (gitignored, structure preserved via .gitkeep)

Basic Usage

# Process all datasets, all signal types (default)
./target/release/ai4pain extract

# Process specific dataset(s) - space-separated
./target/release/ai4pain extract --dataset train
./target/release/ai4pain extract --dataset train validation

# Process specific signal type(s) - space-separated
./target/release/ai4pain extract --signal-type bvp eda
./target/release/ai4pain extract --dataset train --signal-type bvp

# Custom dimensions and time delays
./target/release/ai4pain extract --dimensions 4,5,6 --taus 1,2

# Adjust NaN threshold (default: 85%)
./target/release/ai4pain extract --nan-threshold 90.0

# Control parallelism (default: all cores)
./target/release/ai4pain -j 4 extract --dataset train

# Verbose logging
./target/release/ai4pain -vv extract --dataset train

Output

Format: Long-format CSV (15 rows per signal, 16 columns per row)

File Pattern: results/results_{dataset}_{signal_type}.csv

Example: results_train_bvp.csv, results_validation_eda.csv

Columns (16 total):

file_name, signal, signallength, pe, comp, fisher_shannon, fisher_info,
renyipe, renyicomp, tsallispe, tsalliscomp, dimension, tau,
state, binaryclass, nan_percentage

Example rows:

file_name,signal,signallength,pe,comp,fisher_shannon,fisher_info,renyipe,renyicomp,tsallispe,tsalliscomp,dimension,tau,state,binaryclass,nan_percentage
data/train/Bvp/15.csv,15_HIGH_10,1022,0.978141,0.020520,0.978141,0.013560,0.020520,0.978141,0.020520,0.978141,3,1,high,2,82.73
data/train/Bvp/15.csv,15_HIGH_10,1022,0.933456,0.060372,0.933456,0.042334,0.060372,0.933456,0.060372,0.933456,3,2,high,2,82.73
data/train/Bvp/15.csv,15_HIGH_10,1022,0.917066,0.072145,0.917066,0.052227,0.072145,0.917066,0.072145,0.917066,3,3,high,2,82.73
...

Architecture

src/
├── main.rs              # CLI entry point (clap), orchestration
├── entropy.rs           # 5 entropy implementations (custom, no external lib)
├── signal_processing.rs # Z-score normalization, NaN handling
├── data_loader.rs       # Parallel CSV loading
├── feature_extractor.rs # Rayon-based batch processing
└── types.rs             # Data structures (SignalType, Dataset, etc.)

Key differences from Python:

No ordpy dependency: Custom entropy implementation
Parallel processing: Rayon parallel iterators (automatic multi-core)
Static typing: Compile-time guarantees, zero runtime overhead

Entropy Implementation

All five entropy measures use identical algorithms to Python/ordpy:

Permutation Entropy: Ordinal pattern extraction, Shannon entropy calculation
Statistical Complexity: Jensen-Shannon divergence from uniform distribution
Fisher Information: Gradient-based sensitivity (full distribution with missing patterns)
Renyi Entropy: Generalized entropy (q=1, Shannon limit)
Tsallis Entropy: Non-extensive entropy (q=1, Shannon limit)

Validation: 100% numerical agreement with Python across all parameters (d=3-7, τ=1-3)

Validation

Please this implementation for more details.

Comparison against Python:

# Generate features with both implementations
cd ../AI4Pain-Feature-Extraction-V2
python run_python_extraction.py  # → results/python_features_train.csv

cd ../ai4pain-rust
./target/release/ai4pain extract --dataset train  # → results/rust_features_train.csv

# Compare outputs (should be identical)
diff <(sort ../AI4Pain-Feature-Extraction-V2/results/python_features_train.csv) \
     <(sort results/rust_features_train.csv)

Expected result: No differences (all 120 features match within floating-point precision)

Configuration

Command-line arguments (see ./target/release/ai4pain --help):

Global Options

-v, -vv, -vvv: Verbosity level (warn/info/debug/trace)
-j, --workers <N>: Number of parallel workers (default: all CPUs)
-o, --output <DIR>: Output directory (default: results)

Extract Command

-d, --dataset <NAMES>: Space-separated datasets (train validation test), default: all three
-s, --signal-type <TYPES>: Space-separated signal types (bvp eda resp spo2), default: all four
--dimensions <LIST>: Comma-separated embedding dimensions (default: 3,4,5,6,7)
--taus <LIST>: Comma-separated time delays (default: 1,2,3)
--nan-threshold <PCT>: Skip signals with >PCT% NaN (default: 85.0)

Example

./target/release/ai4pain -vv -j 8 extract \
    --dataset train validation \
    --signal-type bvp eda \
    --dimensions 3,4,5 \
    --taus 1,2 \
    --nan-threshold 90.0

Dependencies

Cargo.toml:

[dependencies]
clap = { version = "4.0", features = ["derive"] }  # CLI parsing
ndarray = "0.15"                                   # NumPy equivalent
rayon = "1.7"                                      # Parallelization
csv = "1.2"                                        # CSV I/O
anyhow = "1.0"                                     # Error handling
log = "0.4"                                        # Logging
env_logger = "0.10"                                # Log configuration
indicatif = "0.17"                                 # Progress bars

Troubleshooting

Issue: Compilation fails with "could not compile ndarray"

Solution: Update Rust: rustup update

Issue: Linking errors on macOS

Solution: Install Xcode tools: xcode-select --install

Issue: Slow performance (not 200× faster)

Check: Did you use --release flag? Debug builds are 30-100× slower
Verify: file target/release/ai4pain should show "not stripped" (optimized)

Issue: Different results from Python

Check: Same dimension/tau parameters?
Validate: Run ./target/release/ai4pain validate --file results/rust_features_train.csv
Compare: Use diff as shown in Validation section above

Issue: "Too many open files" error

Solution: Increase limit: ulimit -n 4096

Cross-Compilation

Build for different platforms:

# macOS (Apple Silicon)
cargo build --release --target aarch64-apple-darwin

# macOS (Intel)
cargo build --release --target x86_64-apple-darwin

# Linux
cargo build --release --target x86_64-unknown-linux-gnu

# Windows
cargo build --release --target x86_64-pc-windows-msvc

Install target (if not present):

rustup target add x86_64-unknown-linux-gnu

Project Structure

ai4pain-rust/
├── Cargo.toml                  # Dependencies and build configuration
├── Cargo.lock                  # Locked dependency versions
├── src/                        # Source code (see Architecture above)
├── target/                     # Cargo build artifacts (gitignored)
│   ├── debug/                  # Debug builds
│   └── release/                # Optimized builds
│       └── ai4pain             # Final binary
├── data/                       # Input directory (gitignored - private)
│   ├── test/.gitkeep           # Placeholder for test data
│   ├── train/.gitkeep          # Placeholder for train data
│   └── validation/.gitkeep     # Placeholder for validation data
├── results/                    # Output CSVs (gitignored)
│   └── .gitkeep                # Placeholder to preserve directory
├── .gitignore                  # Excludes data/, results/, target/
└── README.md                   # This file

Citation

@software{ai4pain_rust,
  author = {Kamarthi, Vignan},
  title = {AI4Pain Rust Implementation: High-Performance Entropy Feature Extraction},
  year = {2025},
  institution = {Northeastern University},
  note = {200× speedup over Python, 100\% numerical validation}
}

Related Implementations

Python version (reference implementation): Python Implementation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI4Pain Rust Implementation

Overview

Performance

Quick Start

Installation

Data Organization

Auto-Generated Folders

Basic Usage

Output

Architecture

Entropy Implementation

Validation

Configuration

Global Options

Extract Command

Example

Dependencies

Troubleshooting

Cross-Compilation

Project Structure

Citation

Related Implementations

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
results		results
src		src
.gitignore		.gitignore
Cargo.toml		Cargo.toml
README.md		README.md

vignankamarthi/Feature-Extraction-Rust

Folders and files

Latest commit

History

Repository files navigation

AI4Pain Rust Implementation

Overview

Performance

Quick Start

Installation

Data Organization

Auto-Generated Folders

Basic Usage

Output

Architecture

Entropy Implementation

Validation

Configuration

Global Options

Extract Command

Example

Dependencies

Troubleshooting

Cross-Compilation

Project Structure

Citation

Related Implementations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages