Coding classes for Data Analysis 3 on MSc in Business Analytics on the Central European University
Open an R project as given on the book's home page: How to set up your computer for R.
Download the appropriate files, or fork this repo, clone it, and open the code from your R project's environment. Once you're done, you are good to go.
-
class 13, used cars
Basic data manipulation and exploratory data analysis.
Basic visualizations; plotting logged values in ggplot.
Multiple linear regression.
Model selection by goodness-of-fit metrics.
Cross-validation and model comparison. -
class 14, airbnb
Handling missing data; integrating missing data information in the analytics.
Model setup.
Interactions and dummies.
Train, test and holdout sets.
Cross-validation, train and test metrics.
Lasso:- running a lasso optimization
- interpreting the results
- RMSE
Diagnostics on the holdout set.
Plotting prediction results. -
class 15, used cars
Data manipulation as in class 13
Basic regression trees
Plotting trees and regression results as step functions
Building more complex regression trees with control parameters
Pruning
Comparing tree-based and OLS models
Variable importance: with final only and with competing variables -
class 16, airbnb, hitters
Setting up grid for grid search incaret::train
Running random forest model using theranger
package incaret
Getting and plotting individual and grouped variable importances
Partial dependence plots for rf models Predictions and RMSE for subsets of data
Comparing OLS, LASSO, CART, and random forest
Gradient Boosting Machines: tuning and model run
Hitters: parameter grid search on a smaller and easier-to-handle dataset
The airbnb analysis is implemented both in R and in Python using a Jupyter notebook -
class 17, bisnode
Modelling probabilities with simple & lasso logit
CV RMSE & AUC for probability models
Classification using logit with no loss function
Calibration plot, confusion matrix, ROC, AUC
Classification (logit) with user-defined loss function
Classification with CART
Random forest for probabilities, with and without a loss function
Classification with random forest -
class 18, swimming pool, Case-Shiller
Managing time series data with thetsibble
package
Deterministic modelling: OLS, trend, seasonal & other dummies
IntroducingfbProphet
Stochastic modelling with thefable
package
ARIMA, auto-arima
Vector Autoregressions