Skip to content

Coding classes for Data Analysis 3 on MSc in Business Analytics on the Central European University

Notifications You must be signed in to change notification settings

peterduronelly/DA3-Coding-Examples

Repository files navigation

DA3-Coding-Examples

Coding classes for Data Analysis 3 on MSc in Business Analytics on the Central European University

How to run the codes?

Open an R project as given on the book's home page: How to set up your computer for R.

Download the appropriate files, or fork this repo, clone it, and open the code from your R project's environment. Once you're done, you are good to go.

Contents

  1. class 13, used cars
    Basic data manipulation and exploratory data analysis.
    Basic visualizations; plotting logged values in ggplot.
    Multiple linear regression.
    Model selection by goodness-of-fit metrics.
    Cross-validation and model comparison.

  2. class 14, airbnb
    Handling missing data; integrating missing data information in the analytics.
    Model setup.
    Interactions and dummies.
    Train, test and holdout sets.
    Cross-validation, train and test metrics.
    Lasso:

    • running a lasso optimization
    • interpreting the results
    • RMSE

    Diagnostics on the holdout set.
    Plotting prediction results.

  3. class 15, used cars
    Data manipulation as in class 13
    Basic regression trees
    Plotting trees and regression results as step functions
    Building more complex regression trees with control parameters
    Pruning
    Comparing tree-based and OLS models
    Variable importance: with final only and with competing variables

  4. class 16, airbnb, hitters
    Setting up grid for grid search in caret::train
    Running random forest model using the ranger package in caret
    Getting and plotting individual and grouped variable importances
    Partial dependence plots for rf models Predictions and RMSE for subsets of data
    Comparing OLS, LASSO, CART, and random forest
    Gradient Boosting Machines: tuning and model run
    Hitters: parameter grid search on a smaller and easier-to-handle dataset
    The airbnb analysis is implemented both in R and in Python using a Jupyter notebook

  5. class 17, bisnode
    Modelling probabilities with simple & lasso logit
    CV RMSE & AUC for probability models
    Classification using logit with no loss function
    Calibration plot, confusion matrix, ROC, AUC
    Classification (logit) with user-defined loss function
    Classification with CART
    Random forest for probabilities, with and without a loss function
    Classification with random forest

  6. class 18, swimming pool, Case-Shiller
    Managing time series data with the tsibble package
    Deterministic modelling: OLS, trend, seasonal & other dummies
    Introducing fbProphet
    Stochastic modelling with the fable package
    ARIMA, auto-arima
    Vector Autoregressions

About

Coding classes for Data Analysis 3 on MSc in Business Analytics on the Central European University

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published