High-Performance Entropy Feature Extraction for Pain Assessment
Production-ready Rust implementation delivering 201× speedup over Python with long-format output aligned with Jupyter notebook analysis pipeline. Maintains 100% numerical validation with Python/ordpy implementation.
This is a direct algorithmic translation of the Python implementation to Rust, preserving all mathematical operations while leveraging Rust's zero-cost abstractions and fearless concurrency:
- 201× faster: Processes signals at 4,000+ rows/second (7.3s vs. 24.5 minutes for train/Bvp)
- Long-format output: 15 rows per signal, 16 columns per row (notebook-aligned)
- 100% validated: Numerical agreement with Python/ordpy within CSV precision (<1e-6)
- 16× less memory: 180 MB vs. 3.2 GB peak usage
- Multi-core parallelization: Automatic via Rayon (12-core CPU utilized)
- Granular file organization: Separate CSV per dataset × signal_type combination
- Bug-fixed: Corrected Renyi/Tsallis ordering (validated against Python)
Hardware: MacBook Pro M1, 8 cores, 16 GB RAM
| Metric | Python | Rust | Speedup |
|---|---|---|---|
| Total runtime (train/Bvp) | 1,467s (24.5 min) | 7.33s | 201× |
| Rows/second | 20.1 | 4,027 | 200× |
| Peak memory | 2-3 GB | 180 MB | 16× reduction |
| Numerical accuracy | Reference | 100% match | N/A |
Test dataset: 29,520 rows (1,968 signals × 15 rows each), 16 columns per row, 8 entropy measures per row
Prerequisites: Install Rust toolchain (rustup)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/envBuild optimized binary:
cd ai4pain-rust
cargo build --release
# Binary created at: target/release/ai4painVerify installation:
./target/release/ai4pain --version
# ai4pain 2.0.0Same structure as Python version:
data/
├── train/
│ ├── Bvp/*.csv
│ ├── Eda/*.csv
│ ├── Resp/*.csv
│ └── SpO2/*.csv
├── validation/
│ └── [same structure]
└── test/
└── [same structure]
CSV format: Each column = one participant trial, rows = time samples
Note: The data/test/, data/train/, and data/validation/ directories are preserved as placeholders in the repository via .gitkeep files. All data files within these directories are gitignored for data integrity.
The following folders are created automatically during execution and are gitignored:
results/ - Contains output CSV files (gitignored, structure preserved via .gitkeep)
# Process all datasets, all signal types (default)
./target/release/ai4pain extract
# Process specific dataset(s) - space-separated
./target/release/ai4pain extract --dataset train
./target/release/ai4pain extract --dataset train validation
# Process specific signal type(s) - space-separated
./target/release/ai4pain extract --signal-type bvp eda
./target/release/ai4pain extract --dataset train --signal-type bvp
# Custom dimensions and time delays
./target/release/ai4pain extract --dimensions 4,5,6 --taus 1,2
# Adjust NaN threshold (default: 85%)
./target/release/ai4pain extract --nan-threshold 90.0
# Control parallelism (default: all cores)
./target/release/ai4pain -j 4 extract --dataset train
# Verbose logging
./target/release/ai4pain -vv extract --dataset trainFormat: Long-format CSV (15 rows per signal, 16 columns per row)
File Pattern: results/results_{dataset}_{signal_type}.csv
- Example:
results_train_bvp.csv,results_validation_eda.csv
Columns (16 total):
file_name, signal, signallength, pe, comp, fisher_shannon, fisher_info,
renyipe, renyicomp, tsallispe, tsalliscomp, dimension, tau,
state, binaryclass, nan_percentage
Example rows:
file_name,signal,signallength,pe,comp,fisher_shannon,fisher_info,renyipe,renyicomp,tsallispe,tsalliscomp,dimension,tau,state,binaryclass,nan_percentage
data/train/Bvp/15.csv,15_HIGH_10,1022,0.978141,0.020520,0.978141,0.013560,0.020520,0.978141,0.020520,0.978141,3,1,high,2,82.73
data/train/Bvp/15.csv,15_HIGH_10,1022,0.933456,0.060372,0.933456,0.042334,0.060372,0.933456,0.060372,0.933456,3,2,high,2,82.73
data/train/Bvp/15.csv,15_HIGH_10,1022,0.917066,0.072145,0.917066,0.052227,0.072145,0.917066,0.072145,0.917066,3,3,high,2,82.73
...
src/
├── main.rs # CLI entry point (clap), orchestration
├── entropy.rs # 5 entropy implementations (custom, no external lib)
├── signal_processing.rs # Z-score normalization, NaN handling
├── data_loader.rs # Parallel CSV loading
├── feature_extractor.rs # Rayon-based batch processing
└── types.rs # Data structures (SignalType, Dataset, etc.)
Key differences from Python:
- No ordpy dependency: Custom entropy implementation
- Parallel processing: Rayon parallel iterators (automatic multi-core)
- Static typing: Compile-time guarantees, zero runtime overhead
All five entropy measures use identical algorithms to Python/ordpy:
- Permutation Entropy: Ordinal pattern extraction, Shannon entropy calculation
- Statistical Complexity: Jensen-Shannon divergence from uniform distribution
- Fisher Information: Gradient-based sensitivity (full distribution with missing patterns)
- Renyi Entropy: Generalized entropy (q=1, Shannon limit)
- Tsallis Entropy: Non-extensive entropy (q=1, Shannon limit)
Validation: 100% numerical agreement with Python across all parameters (d=3-7, τ=1-3)
Please this implementation for more details.
Comparison against Python:
# Generate features with both implementations
cd ../AI4Pain-Feature-Extraction-V2
python run_python_extraction.py # → results/python_features_train.csv
cd ../ai4pain-rust
./target/release/ai4pain extract --dataset train # → results/rust_features_train.csv
# Compare outputs (should be identical)
diff <(sort ../AI4Pain-Feature-Extraction-V2/results/python_features_train.csv) \
<(sort results/rust_features_train.csv)Expected result: No differences (all 120 features match within floating-point precision)
Command-line arguments (see ./target/release/ai4pain --help):
-v, -vv, -vvv: Verbosity level (warn/info/debug/trace)-j, --workers <N>: Number of parallel workers (default: all CPUs)-o, --output <DIR>: Output directory (default: results)
-d, --dataset <NAMES>: Space-separated datasets (train validation test), default: all three-s, --signal-type <TYPES>: Space-separated signal types (bvp eda resp spo2), default: all four--dimensions <LIST>: Comma-separated embedding dimensions (default: 3,4,5,6,7)--taus <LIST>: Comma-separated time delays (default: 1,2,3)--nan-threshold <PCT>: Skip signals with >PCT% NaN (default: 85.0)
./target/release/ai4pain -vv -j 8 extract \
--dataset train validation \
--signal-type bvp eda \
--dimensions 3,4,5 \
--taus 1,2 \
--nan-threshold 90.0Cargo.toml:
[dependencies]
clap = { version = "4.0", features = ["derive"] } # CLI parsing
ndarray = "0.15" # NumPy equivalent
rayon = "1.7" # Parallelization
csv = "1.2" # CSV I/O
anyhow = "1.0" # Error handling
log = "0.4" # Logging
env_logger = "0.10" # Log configuration
indicatif = "0.17" # Progress barsIssue: Compilation fails with "could not compile ndarray"
- Solution: Update Rust:
rustup update
Issue: Linking errors on macOS
- Solution: Install Xcode tools:
xcode-select --install
Issue: Slow performance (not 200× faster)
- Check: Did you use
--releaseflag? Debug builds are 30-100× slower - Verify:
file target/release/ai4painshould show "not stripped" (optimized)
Issue: Different results from Python
- Check: Same dimension/tau parameters?
- Validate: Run
./target/release/ai4pain validate --file results/rust_features_train.csv - Compare: Use diff as shown in Validation section above
Issue: "Too many open files" error
- Solution: Increase limit:
ulimit -n 4096
Build for different platforms:
# macOS (Apple Silicon)
cargo build --release --target aarch64-apple-darwin
# macOS (Intel)
cargo build --release --target x86_64-apple-darwin
# Linux
cargo build --release --target x86_64-unknown-linux-gnu
# Windows
cargo build --release --target x86_64-pc-windows-msvcInstall target (if not present):
rustup target add x86_64-unknown-linux-gnuai4pain-rust/
├── Cargo.toml # Dependencies and build configuration
├── Cargo.lock # Locked dependency versions
├── src/ # Source code (see Architecture above)
├── target/ # Cargo build artifacts (gitignored)
│ ├── debug/ # Debug builds
│ └── release/ # Optimized builds
│ └── ai4pain # Final binary
├── data/ # Input directory (gitignored - private)
│ ├── test/.gitkeep # Placeholder for test data
│ ├── train/.gitkeep # Placeholder for train data
│ └── validation/.gitkeep # Placeholder for validation data
├── results/ # Output CSVs (gitignored)
│ └── .gitkeep # Placeholder to preserve directory
├── .gitignore # Excludes data/, results/, target/
└── README.md # This file
@software{ai4pain_rust,
author = {Kamarthi, Vignan},
title = {AI4Pain Rust Implementation: High-Performance Entropy Feature Extraction},
year = {2025},
institution = {Northeastern University},
note = {200× speedup over Python, 100\% numerical validation}
}Python version (reference implementation): Python Implementation