Skip to content

vignankamarthi/Feature-Extraction-Rust

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI4Pain Rust Implementation

High-Performance Entropy Feature Extraction for Pain Assessment

Production-ready Rust implementation delivering 201× speedup over Python with long-format output aligned with Jupyter notebook analysis pipeline. Maintains 100% numerical validation with Python/ordpy implementation.


Overview

This is a direct algorithmic translation of the Python implementation to Rust, preserving all mathematical operations while leveraging Rust's zero-cost abstractions and fearless concurrency:

  • 201× faster: Processes signals at 4,000+ rows/second (7.3s vs. 24.5 minutes for train/Bvp)
  • Long-format output: 15 rows per signal, 16 columns per row (notebook-aligned)
  • 100% validated: Numerical agreement with Python/ordpy within CSV precision (<1e-6)
  • 16× less memory: 180 MB vs. 3.2 GB peak usage
  • Multi-core parallelization: Automatic via Rayon (12-core CPU utilized)
  • Granular file organization: Separate CSV per dataset × signal_type combination
  • Bug-fixed: Corrected Renyi/Tsallis ordering (validated against Python)

Performance

Hardware: MacBook Pro M1, 8 cores, 16 GB RAM

Metric Python Rust Speedup
Total runtime (train/Bvp) 1,467s (24.5 min) 7.33s 201×
Rows/second 20.1 4,027 200×
Peak memory 2-3 GB 180 MB 16× reduction
Numerical accuracy Reference 100% match N/A

Test dataset: 29,520 rows (1,968 signals × 15 rows each), 16 columns per row, 8 entropy measures per row


Quick Start

Installation

Prerequisites: Install Rust toolchain (rustup)

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env

Build optimized binary:

cd ai4pain-rust
cargo build --release

# Binary created at: target/release/ai4pain

Verify installation:

./target/release/ai4pain --version
# ai4pain 2.0.0

Data Organization

Same structure as Python version:

data/
├── train/
│   ├── Bvp/*.csv
│   ├── Eda/*.csv
│   ├── Resp/*.csv
│   └── SpO2/*.csv
├── validation/
│   └── [same structure]
└── test/
    └── [same structure]

CSV format: Each column = one participant trial, rows = time samples

Note: The data/test/, data/train/, and data/validation/ directories are preserved as placeholders in the repository via .gitkeep files. All data files within these directories are gitignored for data integrity.

Auto-Generated Folders

The following folders are created automatically during execution and are gitignored:

results/ - Contains output CSV files (gitignored, structure preserved via .gitkeep)

Basic Usage

# Process all datasets, all signal types (default)
./target/release/ai4pain extract

# Process specific dataset(s) - space-separated
./target/release/ai4pain extract --dataset train
./target/release/ai4pain extract --dataset train validation

# Process specific signal type(s) - space-separated
./target/release/ai4pain extract --signal-type bvp eda
./target/release/ai4pain extract --dataset train --signal-type bvp

# Custom dimensions and time delays
./target/release/ai4pain extract --dimensions 4,5,6 --taus 1,2

# Adjust NaN threshold (default: 85%)
./target/release/ai4pain extract --nan-threshold 90.0

# Control parallelism (default: all cores)
./target/release/ai4pain -j 4 extract --dataset train

# Verbose logging
./target/release/ai4pain -vv extract --dataset train

Output

Format: Long-format CSV (15 rows per signal, 16 columns per row)

File Pattern: results/results_{dataset}_{signal_type}.csv

  • Example: results_train_bvp.csv, results_validation_eda.csv

Columns (16 total):

file_name, signal, signallength, pe, comp, fisher_shannon, fisher_info,
renyipe, renyicomp, tsallispe, tsalliscomp, dimension, tau,
state, binaryclass, nan_percentage

Example rows:

file_name,signal,signallength,pe,comp,fisher_shannon,fisher_info,renyipe,renyicomp,tsallispe,tsalliscomp,dimension,tau,state,binaryclass,nan_percentage
data/train/Bvp/15.csv,15_HIGH_10,1022,0.978141,0.020520,0.978141,0.013560,0.020520,0.978141,0.020520,0.978141,3,1,high,2,82.73
data/train/Bvp/15.csv,15_HIGH_10,1022,0.933456,0.060372,0.933456,0.042334,0.060372,0.933456,0.060372,0.933456,3,2,high,2,82.73
data/train/Bvp/15.csv,15_HIGH_10,1022,0.917066,0.072145,0.917066,0.052227,0.072145,0.917066,0.072145,0.917066,3,3,high,2,82.73
...

Architecture

src/
├── main.rs              # CLI entry point (clap), orchestration
├── entropy.rs           # 5 entropy implementations (custom, no external lib)
├── signal_processing.rs # Z-score normalization, NaN handling
├── data_loader.rs       # Parallel CSV loading
├── feature_extractor.rs # Rayon-based batch processing
└── types.rs             # Data structures (SignalType, Dataset, etc.)

Key differences from Python:

  • No ordpy dependency: Custom entropy implementation
  • Parallel processing: Rayon parallel iterators (automatic multi-core)
  • Static typing: Compile-time guarantees, zero runtime overhead

Entropy Implementation

All five entropy measures use identical algorithms to Python/ordpy:

  1. Permutation Entropy: Ordinal pattern extraction, Shannon entropy calculation
  2. Statistical Complexity: Jensen-Shannon divergence from uniform distribution
  3. Fisher Information: Gradient-based sensitivity (full distribution with missing patterns)
  4. Renyi Entropy: Generalized entropy (q=1, Shannon limit)
  5. Tsallis Entropy: Non-extensive entropy (q=1, Shannon limit)

Validation: 100% numerical agreement with Python across all parameters (d=3-7, τ=1-3)


Validation

Please this implementation for more details.

Comparison against Python:

# Generate features with both implementations
cd ../AI4Pain-Feature-Extraction-V2
python run_python_extraction.py  # → results/python_features_train.csv

cd ../ai4pain-rust
./target/release/ai4pain extract --dataset train  # → results/rust_features_train.csv

# Compare outputs (should be identical)
diff <(sort ../AI4Pain-Feature-Extraction-V2/results/python_features_train.csv) \
     <(sort results/rust_features_train.csv)

Expected result: No differences (all 120 features match within floating-point precision)

Configuration

Command-line arguments (see ./target/release/ai4pain --help):

Global Options

  • -v, -vv, -vvv: Verbosity level (warn/info/debug/trace)
  • -j, --workers <N>: Number of parallel workers (default: all CPUs)
  • -o, --output <DIR>: Output directory (default: results)

Extract Command

  • -d, --dataset <NAMES>: Space-separated datasets (train validation test), default: all three
  • -s, --signal-type <TYPES>: Space-separated signal types (bvp eda resp spo2), default: all four
  • --dimensions <LIST>: Comma-separated embedding dimensions (default: 3,4,5,6,7)
  • --taus <LIST>: Comma-separated time delays (default: 1,2,3)
  • --nan-threshold <PCT>: Skip signals with >PCT% NaN (default: 85.0)

Example

./target/release/ai4pain -vv -j 8 extract \
    --dataset train validation \
    --signal-type bvp eda \
    --dimensions 3,4,5 \
    --taus 1,2 \
    --nan-threshold 90.0

Dependencies

Cargo.toml:

[dependencies]
clap = { version = "4.0", features = ["derive"] }  # CLI parsing
ndarray = "0.15"                                   # NumPy equivalent
rayon = "1.7"                                      # Parallelization
csv = "1.2"                                        # CSV I/O
anyhow = "1.0"                                     # Error handling
log = "0.4"                                        # Logging
env_logger = "0.10"                                # Log configuration
indicatif = "0.17"                                 # Progress bars

Troubleshooting

Issue: Compilation fails with "could not compile ndarray"

  • Solution: Update Rust: rustup update

Issue: Linking errors on macOS

  • Solution: Install Xcode tools: xcode-select --install

Issue: Slow performance (not 200× faster)

  • Check: Did you use --release flag? Debug builds are 30-100× slower
  • Verify: file target/release/ai4pain should show "not stripped" (optimized)

Issue: Different results from Python

  • Check: Same dimension/tau parameters?
  • Validate: Run ./target/release/ai4pain validate --file results/rust_features_train.csv
  • Compare: Use diff as shown in Validation section above

Issue: "Too many open files" error

  • Solution: Increase limit: ulimit -n 4096

Cross-Compilation

Build for different platforms:

# macOS (Apple Silicon)
cargo build --release --target aarch64-apple-darwin

# macOS (Intel)
cargo build --release --target x86_64-apple-darwin

# Linux
cargo build --release --target x86_64-unknown-linux-gnu

# Windows
cargo build --release --target x86_64-pc-windows-msvc

Install target (if not present):

rustup target add x86_64-unknown-linux-gnu

Project Structure

ai4pain-rust/
├── Cargo.toml                  # Dependencies and build configuration
├── Cargo.lock                  # Locked dependency versions
├── src/                        # Source code (see Architecture above)
├── target/                     # Cargo build artifacts (gitignored)
│   ├── debug/                  # Debug builds
│   └── release/                # Optimized builds
│       └── ai4pain             # Final binary
├── data/                       # Input directory (gitignored - private)
│   ├── test/.gitkeep           # Placeholder for test data
│   ├── train/.gitkeep          # Placeholder for train data
│   └── validation/.gitkeep     # Placeholder for validation data
├── results/                    # Output CSVs (gitignored)
│   └── .gitkeep                # Placeholder to preserve directory
├── .gitignore                  # Excludes data/, results/, target/
└── README.md                   # This file

Citation

@software{ai4pain_rust,
  author = {Kamarthi, Vignan},
  title = {AI4Pain Rust Implementation: High-Performance Entropy Feature Extraction},
  year = {2025},
  institution = {Northeastern University},
  note = {200× speedup over Python, 100\% numerical validation}
}

Related Implementations

Python version (reference implementation): Python Implementation

About

201x (7 seconds vs. 24.5 minutes) rapid feature extraction for the AI4Pain 2025 challenge.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages