A state-of-the-art foundational model for battery State of Health (SOH) prediction using advanced time series decomposition and transformer-based methods.
Our foundational model achieves exceptional performance on NASA battery degradation datasets:
| Battery | Model | RMSE | MAE | R² | MAPE |
|---|---|---|---|---|---|
| B0005 | TabPFN | 0.0006 | 0.0004 | 0.9983 | 0.06% |
| B0006 | TabPFN | 0.0013 | 0.0010 | 0.9971 | 0.13% |
| B0007 | TabPFN | 0.0017 | 0.0013 | 0.9751 | 0.19% |
Traditional ML models (XGBoost, RandomForest, GradientBoosting) achieved:
- RMSE: 0.013-0.033 (10-50x worse)
- R²: -4.5 to 0.79 (often negative)
- MAPE: 1.5-4.2%
TabPFN consistently outperforms by 10-50x across all metrics.
-
CEEMDAN (Complete Ensemble Empirical Mode Decomposition with Adaptive Noise)
- Separates capacity signal into intrinsic mode functions (IMFs)
- Captures regeneration phenomena and noise
-
Improved D3R (Dynamic Decomposition with Diffusion Reconstruction)
- Custom loss functions for smooth, monotonic trend extraction
- Spatial-temporal transformer architecture
- Specialized heads for trend, seasonal, and noise components
- Regularization: smoothness loss, monotonic loss, seasonal regularity
-
ARIMA-based features
- Lag features (1-10 lags)
- Multi-horizon forecasts (1-5 steps ahead)
- Autocorrelation and partial autocorrelation
- Residuals and fitted values
-
Rolling statistics
- Multiple window sizes (3, 5, 10 cycles)
- Mean, std, min, max, trend, skewness, kurtosis
-
Degradation indicators
- Capacity fade rate and acceleration
- Cumulative degradation
- Regeneration detection and counting
- Internal resistance proxy
- TabPFN: Pre-trained transformer for tabular data (primary model)
- Ensemble methods: XGBoost, GradientBoosting, RandomForest
- Automated model selection and hyperparameter optimization
Raw measurements (7) voltage, current, temperature, timeRolling statistics (24) mean, std, min, max (windows: 3,5,10)Degradation8fade rate, cumulative fade, EWMAHealth indicators (5) voltage range, resistance proxy, temp increase Cycle features (4) normalized cycle, cycle², √cycle Statistical (6) skewness, kurtosis, trend Regeneration (3) increase flag, count, variance CEEMDAN (15-20IMF) energies, trend, seasonal, noiseD3R3trend, seasonal, noise ARIMA (40+) lags, forecasts, ACF, PACF, residuals
┌─────────────────────────────────────────────────────────────┐ │ Raw Battery Data (.mat) │ └────────────────────────┬────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ Data Loading & Preprocessing │ │ • Extract discharge cycles │ │ • Add timestamps │ │ • Truncate at 70% capacity (EOL) │ │ • Calculate SOH │ └────────────────────────┬────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ Feature Engineering (40+ features) │ │ • Rolling statistics │ │ • Degradation features │ │ • Health indicators │ │ • Statistical features │ └────────────────────────┬────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ CEEMDAN Decomposition │ │ • Trend extraction │ │ • Cyclical patterns │ │ • Noise separation │ │ • IMF energy features │ └────────────────────────┬────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ Improved D3R Decomposition (Transformer) │ │ • Smooth trend (degradation) │ │ • Seasonal patterns (regeneration) │ │ • Noise (residual) │ │ • Embedding features │ └────────────────────────┬────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ ARIMA Feature Engineering │ │ • Lag features │ │ • Forecasts (1-5 steps) │ │ • Autocorrelation │ │ • Differencing │ └────────────────────────┬────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ TabPFN + Ensemble Prediction │ │ • TabPFN (pre-trained transformer) │ │ • XGBoost, GradientBoosting, RandomForest │ │ • Ensemble averaging │ │ • Feature importance analysis │ └────────────────────────┬────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ SOH Predictions │ │ • RMSE: 0.0006-0.0017 │ │ • R²: 0.975-0.998 │ │ • MAPE: 0.06-0.19% │ └─────────────────────────────────────────────────────────────┘
License This project is licensed under the MIT License - see LICENSE file. 🙏 Acknowledgments
NASA PCoE for the battery dataset: https://www.nasa.gov/intelligent-systems-division/discovery-and-systems-health/pcoe/pcoe-data-set-repository/ TabPFN authors for the pre-trained model D3R authors (ForestsKing) for decomposition inspiration EMD-signal contributors for CEEMDAN implementation
📧 Contact
Author: [Your Name] Email: [[email protected]] GitHub: @yourusername LinkedIn: Your Profile
📖 Citation If you use this work, please cite: @software{battery_soh_foundational_2025, title={Battery SOH Foundational Model: Advanced Time Series Decomposition with TabPFN}, author={Your Name}, year={2025}, url={https://github.com/yourusername/battery_SOH_FoundationalModel} }