🚛 Scania Truck Predictive Maintenance — Fault Detection using Machine Learning

👨‍💼 Project Overview

This project simulates work done within Scania’s Data Analytics & Fleet Management Division, where the goal is to predict component failures in trucks before they occur. Using real-world operational data from Scania’s APS (Air Pressure System), this analysis enables predictive maintenance, reducing unplanned downtime, maintenance costs, and safety risks.

Scania’s Fleet Management System (FMS) and Service 360 Pro leverage telematics data and AI to forecast potential failures. This project aims to replicate that data-driven approach by building a predictive model capable of classifying faulty (pos) and non-faulty (neg) vehicles.

🎯 Objectives

Build a data-driven predictive maintenance model using Scania’s APS dataset.
Handle severe class imbalance effectively.
Compare performance between Logistic Regression (from scratch) and Gaussian Naive Bayes (from scratch).
Provide actionable insights for fleet managers and maintenance teams.

🧩 Dataset Information

Dataset Source: Scania APS Failure dataset
Features: 170+ anonymized operational and histogram variables
Target:
- pos → trucks with failure
- neg → healthy trucks
Missing Values: Represented as "na", filled using median imputation.
Imbalance: Extremely high — healthy trucks (~98%) vs faulty (~2%).

⚙️ Data Preprocessing Steps

Missing Value Handling: Replaced with column-wise median.
Highly Correlated Features: Removed where correlation > 0.95.
Feature Scaling: Standardized using z-score normalization.
Train-Validation Split: 80:20 stratified split.
Balancing Strategy:
- No Downsampling (retain all majority samples).
- SMOTE Oversampling applied to minority class (10× increase).

📊 A pie chart and missing value bar plots were created to visualize data imbalance and completeness.

🧮 Models Implemented (from Scratch)

1. Logistic Regression

Built using NumPy only (no scikit-learn).
Includes sigmoid function, gradient descent optimization, and binary cross-entropy loss.
Trained for 1000 epochs with validation loss tracking.

2. Gaussian Naive Bayes

Assumes features follow normal (Gaussian) distribution.
Calculates mean, variance, and prior probability for each class.
Used log probability for numerical stability.

📈 Model Evaluation Metrics

Each model was evaluated on:

Accuracy
Precision
Recall
F1-Score
Confusion Matrix (heatmap)

📊 Results Summary

Model	Accuracy	Precision	Recall	F1-Score
Logistic Regression (from scratch)	0.9871	0.6764	0.8640	0.7588
Gaussian Naive Bayes (from scratch)	0.9639	0.3848	0.9040	0.5398

🔍 Key Insights

Logistic Regression achieved a balanced trade-off between precision and recall, making it suitable for operational deployment.
Gaussian Naive Bayes achieved higher recall, useful for initial screening, but generated more false positives.
For Scania’s fleet management system, recall is crucial — missing a faulty vehicle (false negative) could result in costly breakdowns.
Therefore, Logistic Regression provides a more reliable, balanced performance for predictive maintenance.

🧠 Additional Explorations

Missing Value Feature Visualization:
Plotted bar plots to check missing value percentage to decide which feature need to drop
Confusion Matrix Heatmap:
Visualizes model’s classification power and error spread.
Future Work:
- Try Ensemble Learning (e.g., Random Forest).
- Integrate real-time sensor data for continuous learning.
- Model Deplyement and Monitor CI/CD Pipelines

🧾 Key Visuals

Class Imbalance Pie Chart
Missing Values Bar Plot
Confusion Matrix Heatmap

💬 Presentation Flow

If you’re presenting this project:

Slide 1–2: Scania company intro & predictive maintenance goal.
Slide 3–5: Data challenges — missing values, imbalance, correlations.
Slide 6–8: Model development (Logistic & GNB).
Slide 9: Model performance table.
Slide 10–11: Key insights & business impact.
Slide 12: Conlcusion.

🧰 Tech Stack

Languages: Python (NumPy, Pandas, Matplotlib, Seaborn)
ML Libraries: imblearn (SMOTE only)
Visualization: Matplotlib
Environment: Jupyter Notebook / Colab

🏁 Conclusion

This project demonstrates how Scania’s data-driven approach can be replicated to predict component failures, improving fleet reliability, maintenance efficiency, and safety.
The final model — Logistic Regression — provides a strong foundation for Scania’s predictive maintenance analytics pipeline.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Dataset		Dataset
Predictive Fleet Maintenance Using Machine Learning.pdf		Predictive Fleet Maintenance Using Machine Learning.pdf
Predictive Fleet Maintenance Using Machine Learning.pptx		Predictive Fleet Maintenance Using Machine Learning.pptx
Predictive_Maintanance_ML.ipynb		Predictive_Maintanance_ML.ipynb
Project Report.docx		Project Report.docx
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚛 Scania Truck Predictive Maintenance — Fault Detection using Machine Learning

👨‍💼 Project Overview

🎯 Objectives

🧩 Dataset Information

⚙️ Data Preprocessing Steps

🧮 Models Implemented (from Scratch)

1. Logistic Regression

2. Gaussian Naive Bayes

📈 Model Evaluation Metrics

📊 Results Summary

🔍 Key Insights

🧠 Additional Explorations

🧾 Key Visuals

💬 Presentation Flow

🧰 Tech Stack

🏁 Conclusion

About

Uh oh!

Releases

Packages

Languages

krunal48/Scania_Predictive_Maintanance

Folders and files

Latest commit

History

Repository files navigation

🚛 Scania Truck Predictive Maintenance — Fault Detection using Machine Learning

👨‍💼 Project Overview

🎯 Objectives

🧩 Dataset Information

⚙️ Data Preprocessing Steps

🧮 Models Implemented (from Scratch)

1. Logistic Regression

2. Gaussian Naive Bayes

📈 Model Evaluation Metrics

📊 Results Summary

🔍 Key Insights

🧠 Additional Explorations

🧾 Key Visuals

💬 Presentation Flow

🧰 Tech Stack

🏁 Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages