GitHub - tejasmagia/PCA-for-dimensionality-reduction_PredictingTimeForTestingCars

Mercedes-Benz Greener Manufacturing

Since the first automobile, the Benz Patent Motor Car in 1886, Mercedes-Benz has stood for important automotive innovations. These include, for example, the passenger safety cell with crumple zone, the airbag and intelligent assistance systems. Mercedes-Benz applies for nearly 2000 patents per year, making the brand the European leader among premium car makers. Daimler’s Mercedes-Benz cars are leaders in the premium car industry. With a huge selection of features and options, customers can choose the customized Mercedes-Benz of their dreams. To ensure the safety and reliability of each and every unique car configuration before they hit the road, Daimler’s engineers have developed a robust testing system. But, optimizing the speed of their testing system for so many possible feature combinations are complex and time-consuming without a powerful algorithmic approach. As one of the world’s biggest manufacturers of premium cars, safety and efficiency are paramount on Daimler’s production lines.

Attribute information This dataset contains an anonymized set of variables (X0 to X385), each representing a custom feature in a Mercedes car. ** Note: This is confidential data and we will not be able to provide description for each of the anonymized set of variables. For example, a variable could be 4WD, added air suspension, or a head-up display. The ground truth is labeled ‘y’ and represents the time (in seconds) that the car took to pass testing for each variable. File descriptions Variables with letters are categorical. Variables with 0/1 are binary values. ● train.csv - the training set Objective: Tackle the curse of dimensionality (Use PCA for dimensionality reduction) and then Predict the time it takes to pass testing for cars.

Notebook contains: 1] Exploratory Visualization – covered in notebook 2] Data Cleaning – covered in notebook 3] Feature Engineering - Converted some categorical features into numeric values using LabelEncoder 4] Modelling & Evaluation - choose 13 models and use 5-folds cross-validation to evaluate (LinearRegression,Ridge,Lasso,RandomForestRegressor,GradientBoostingRegressor,SVR,LinearSVR,ElasticNet,SGDRegressor,BayesianRidge,KernelRidge,ExtraTreesRegressor,XGBRegressor,LassoLarsCV) models 5] Ensemble Methods - Average base models according to their weights, picked 6 best R-score models from point [4] 6] Stacking Technique - Combine information from multiple predictive models to generate a new model. Often times the stacked model (also called 2nd-level model) will outperform each of the individual models due its smoothing nature and ability to highlight each base model where it performs best and discredit each base model where it performs poorly. Also, I have combine features generated from stacking and original features. 7] Improve score using dimensionality reduction and Stacking technique 8] Apply kernel to Kaggle data - Final score was 0.56784 and rank is 1061 as per public score

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
Mercedes-Benz Greener Manufacturing.docx		Mercedes-Benz Greener Manufacturing.docx
Mercedes-Benz Greener Manufacturing.pdf		Mercedes-Benz Greener Manufacturing.pdf
README.md		README.md
UpX - Mercedes Benz Greener Manufacturing.ipynb		UpX - Mercedes Benz Greener Manufacturing.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages