useR 2015 Tutorial

Applied Machine Learning and Efficient Model Selection with mlr

by Bernd Bischl, Michel Lang

to be held on useR 2015 in Aarlborg, DK, June 30th, 2015.

Background knowledge required: Basic knowledge of R and machine learning

Potential attendees: Anybody from academia or industry with an interest in modern machine learning

Brief description of the tutorial

R does not define a standardized interface for all its machine learning algorithms. Therefore, for any non-trivial experiments you need to write lengthy, tedious and error-prone wrappers to call the different algorithms and unify their respective output. The mlr package offers a clean, easy-to-use and flexible domain specific language for machine learning experiments in R. It supports classification, regression, clustering and survival analysis and connects to nearly a hundred predictive modeling techniques. The package allows for different hyperparameter optimization and configuration techniques, including iterated F-racing and sequential model based optimization. Variable selection is possible through various filter and wrapper approaches.

Hence, mlr allows data analysts who are neither experts in machine learning nor seasoned R programmers to nevertheless specify and complex machine learning experiments in short, succinct and scalable code. Experienced programmers, on the other hand, get to wield a large, well-designed toolbox, which they can customize and extend to quickly construct their own algorithms.

The course will enable the participants to understand and apply the basic mlr operations for data handling and preprocessing, model building, evaluation and resampling. After these basics are covered we will especially focus on the important aspects of benchmarking, model selection and hyperparameter tuning. As all of these usually require a large amount of computational resources in realistic applications, we will show how to easily parallelize them in common parallel environments. The course will end with a short demonstration on how to access the new OpenML server for open machine learning (http://www.openml.org) which provides a large repository of benchmark data sets and enables reproducible experiments and meta analysis.

Detailed Outline

Very brief intro to applied machine learning
Data handling and machine learning tasks
Classification, regression, clustering and survival modeling with mlr
Performance evaluation and resampling
Visual model analysis
Parallelization and high-performance computing
Model selection and hyper-parameter tuning
Interfacing the OpenML server with mlr

Install instructions

Install the CRAN version of the package and some optional dependencies which are required for this tutorial:

install.packages(c("glmnet", "gridExtra", "ggplot2", "ggvis", "kknn", 
  "irace", "kernlab", "KMsurv", "mlbench", "mda", "mlr", "party", 
  "randomForest", "randomForestSRC", "shiny", "sROC", "devtools", 
  "survMisc", "BatchJobs", "e1071", "rjson"))

Slides and supplementary material

You can find everything here.

Hands on

Create a classification task for the Ionosphere data set (package mlbench).
Remove constant features from the data set.
Evaluate a random forest on this task. Tune the parameters mtry and nodesize (both from 1 to 10) using a 3-fold cross validation and the random search (see ?makeTuneControlRandom). Set the budget via maxit to 20.
Compare the mean misclassification error (mmce) with the results of single classification tree from the rpart package.
NB: This is NOT really a proper evaluation of the tuned algorithm. But we have not learned nested CV so far :(

Palermo workshop

Some stuff from Palermo I am too lazy to translate

Vorschlag: mlrLearners paket machen

schneller Testen auf Travis
wir bleiben nur bei einem extra Paket
logisch besser aufgebaut
Verwaltbarkeit wird einfacher (log history)

o mini: 1 Learner pro Type (eher Top 10) o List Learners sollte unabhängig funktionieren o ideales Testsystem bräuchte o option für Paket Autoinstall bei paket installieren. Fehlermeldung zeigt hinweis, wie man es auto installiert

grundstruktur änderungen müssten umständlich synchronisiert werden
höhere maintanance
PRs könnten über beide Pakete gehen

learner property "recommended"

learner ontology

datatable gegen dplyr? plyr rauswerfen

printer für chains von wrapper machen

prediction objekte reviewn

plotModel methode für schönsten plot

top10 workhop

alle im gleichen hotel und arbeitsort super nah
raum offen wann wir wollen
schedule online. mit essen.
wasser klopapier und gesunde snacks im raum
bessere vorher teams bilden? noch bessere inhaltliche planung
ort leichter zu erreichen (flug, anreise)
mehr funding vorher workshop gebühren einnehmren und ummünzen?
mehrerere räume, ruhig, runde tische mit platz, beamer
package release vor workshop

mlr Warsaw 2018

Workshop documentation

Talk
Demo
Data set

Further links

Link to GitHub
mlr cheatsheet
Detailed Tutoiral

mlr @ WhyR?2018 conference

Link to Blogpost
Link to Talks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mlr-events-workshops.md

mlr-events-workshops.md

useR 2015 Tutorial

Brief description of the tutorial

Detailed Outline

Install instructions

Slides and supplementary material

Hands on

Palermo workshop

mlr Warsaw 2018

Workshop documentation

Further links

mlr @ WhyR?2018 conference

Files

mlr-events-workshops.md

Latest commit

History

mlr-events-workshops.md

File metadata and controls

useR 2015 Tutorial

Brief description of the tutorial

Detailed Outline

Install instructions

Slides and supplementary material

Hands on

Palermo workshop

mlr Warsaw 2018

Workshop documentation

Further links

mlr @ WhyR?2018 conference