Skip to content

krunal48/Scania_Predictive_Maintanance

Repository files navigation

🚛 Scania Truck Predictive Maintenance — Fault Detection using Machine Learning

👨‍💼 Project Overview

This project simulates work done within Scania’s Data Analytics & Fleet Management Division, where the goal is to predict component failures in trucks before they occur. Using real-world operational data from Scania’s APS (Air Pressure System), this analysis enables predictive maintenance, reducing unplanned downtime, maintenance costs, and safety risks.

Scania’s Fleet Management System (FMS) and Service 360 Pro leverage telematics data and AI to forecast potential failures. This project aims to replicate that data-driven approach by building a predictive model capable of classifying faulty (pos) and non-faulty (neg) vehicles.


🎯 Objectives

  • Build a data-driven predictive maintenance model using Scania’s APS dataset.
  • Handle severe class imbalance effectively.
  • Compare performance between Logistic Regression (from scratch) and Gaussian Naive Bayes (from scratch).
  • Provide actionable insights for fleet managers and maintenance teams.

🧩 Dataset Information

  • Dataset Source: Scania APS Failure dataset
  • Features: 170+ anonymized operational and histogram variables
  • Target:
    • pos → trucks with failure
    • neg → healthy trucks
  • Missing Values: Represented as "na", filled using median imputation.
  • Imbalance: Extremely high — healthy trucks (~98%) vs faulty (~2%).

⚙️ Data Preprocessing Steps

  1. Missing Value Handling: Replaced with column-wise median.
  2. Highly Correlated Features: Removed where correlation > 0.95.
  3. Feature Scaling: Standardized using z-score normalization.
  4. Train-Validation Split: 80:20 stratified split.
  5. Balancing Strategy:
    • No Downsampling (retain all majority samples).
    • SMOTE Oversampling applied to minority class (10× increase).

📊 A pie chart and missing value bar plots were created to visualize data imbalance and completeness.


🧮 Models Implemented (from Scratch)

1. Logistic Regression

  • Built using NumPy only (no scikit-learn).
  • Includes sigmoid function, gradient descent optimization, and binary cross-entropy loss.
  • Trained for 1000 epochs with validation loss tracking.

2. Gaussian Naive Bayes

  • Assumes features follow normal (Gaussian) distribution.
  • Calculates mean, variance, and prior probability for each class.
  • Used log probability for numerical stability.

📈 Model Evaluation Metrics

Each model was evaluated on:

  • Accuracy
  • Precision
  • Recall
  • F1-Score
  • Confusion Matrix (heatmap)

📊 Results Summary

Model Accuracy Precision Recall F1-Score
Logistic Regression (from scratch) 0.9871 0.6764 0.8640 0.7588
Gaussian Naive Bayes (from scratch) 0.9639 0.3848 0.9040 0.5398

🔍 Key Insights

  • Logistic Regression achieved a balanced trade-off between precision and recall, making it suitable for operational deployment.
  • Gaussian Naive Bayes achieved higher recall, useful for initial screening, but generated more false positives.
  • For Scania’s fleet management system, recall is crucial — missing a faulty vehicle (false negative) could result in costly breakdowns.
  • Therefore, Logistic Regression provides a more reliable, balanced performance for predictive maintenance.

🧠 Additional Explorations

  • Missing Value Feature Visualization:
    Plotted bar plots to check missing value percentage to decide which feature need to drop
  • Confusion Matrix Heatmap:
    Visualizes model’s classification power and error spread.
  • Future Work:
    • Try Ensemble Learning (e.g., Random Forest).
    • Integrate real-time sensor data for continuous learning.
    • Model Deplyement and Monitor CI/CD Pipelines

🧾 Key Visuals

  • Class Imbalance Pie Chart
  • Missing Values Bar Plot
  • Confusion Matrix Heatmap

💬 Presentation Flow

If you’re presenting this project:

  1. Slide 1–2: Scania company intro & predictive maintenance goal.
  2. Slide 3–5: Data challenges — missing values, imbalance, correlations.
  3. Slide 6–8: Model development (Logistic & GNB).
  4. Slide 9: Model performance table.
  5. Slide 10–11: Key insights & business impact.
  6. Slide 12: Conlcusion.

🧰 Tech Stack

  • Languages: Python (NumPy, Pandas, Matplotlib, Seaborn)
  • ML Libraries: imblearn (SMOTE only)
  • Visualization: Matplotlib
  • Environment: Jupyter Notebook / Colab

🏁 Conclusion

This project demonstrates how Scania’s data-driven approach can be replicated to predict component failures, improving fleet reliability, maintenance efficiency, and safety.
The final model — Logistic Regression — provides a strong foundation for Scania’s predictive maintenance analytics pipeline.

About

Scania: Driving the Future with Predictive Maintenance

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published