This is an open, shareable, reproducible, computational research project on using multiclass AUC as a performance metric for multiclass classification.
It is the work of Ross W. Gayler
It is a common data science task to classify inputs by assigning each input to one of a fixed number of categories. Given the existence of a set of inputs with known categories we want to apply the classification model and assess its performance by comparing the assigned categories to the known categories.
Accuracy is a commonly used performance metric. However, accuracy assumes that all misclassifications are equally costly and that the base rates of the categories are fixed. Signal Detection Theory (SDT) explicitly addresses these assumptions and Area Under the Curve (AUC) is a common SDT metric of discriminability. AUC is most commonly dealt with in the context of binary classification. AUC for multiclass classification is much less commonly encountered in practice and there is no commonly accepted agreement on how best to apply it
The objective of this project is to apply multiclass AUC approaches to some trial datasets to get a better understanding of how to apply it and the advantages/disadvantages.
The point of making it an open, shareable, reproducible project is that anyone should be able to copy it, reproduce the analyses, and try out modifications.
Multiclass AUC by Ross W. Gayler is licensed under CC BY 4.0
All materials other than code are released under the Creative Commons Attribution 4.0 International License.
Code is released under the MIT license.
This is an open, shareable, reproducible, computational research project.
-
All the computational work and document preparation is done with the R statistical computing environment and the Rstudio integrated development environment.
-
The entire research project is contained in a single directory that corresponds to an RStudio R project.
-
The
renv
package is used to manage the R package versions used by the project -
The
workflowr
package is used to structure the project so that all the materials and outputs are available via an openly accessible, automatically generated website. -
The analyses are organised as notebooks which are largely free-standing. Any dependencies are noted in each notebook. The dependencies are managed manually.
-
The project code and documents are shared publicly on GitHub at https://github.com/rgayler/multiclass_AUC
-
The website automatically generated by
workflowr
from the rendered project documents is at https://rgayler.github.io/multiclass_AUC/
workflowr
creates a set of standard directories.
See the package documentation for details on how these directories are used.
The brief purposes are:
analysis
-rmarkdown
analysis notebookscode
- R code not in analysis notebooksdata
- raw data and associated metadatadocs
- automatically generated websiteoutput
- generated data and other objects
workflowr
only manages the subset of files that it knows about, so you will need to manually stage and commit any other files that need to be mirrored on GitHub.
If any files in data
and output
are more than trivially small, they will not be shared via Git and GitHub.
.gitignore
will be used to keep them out of Git.- There will be a separate mechanism (e.g. Zenodo) for sharing those large files.
The renv
package keeps track of the R packages (and their versions) used by the project.
It allows anyone to reinstate the same packages and versions in their local copy of the project.
renv
does not address system dependencies (e.g. R version, system libraries), so if these are critical I will need to use something like docker - but not yet.
The renv
directory contains the information need by renv
to reinstate the local package environment.
.gitignore
in the R project root directory is used for all the manually created entries, so that all the manual rules are in one place.
Packages, such as renv
, may create their own .gitignore
files in subdirectories that they manage.
The static website automatically generated by workflowr
is stored in the docs
directory.
The key document is docs/index.html
.
Open this file with a browser to get access to the website.
docs/index.html
allows you to navigate to all the generated content.
This index page is mirrored on the internet at https://rgayler.github.io/multiclass_AUC/index.html
- All detailed setup instructions and notes go in this project-level
READ.md
file. - The
README.md
files in the subdirectories only state the purpose of each subdirectory and the files in that directory.
This assumes that you already have current versions of R and RStudio installed.
-
Clone the project repository https://github.com/rgayler/multiclass_AUC from GitHub
-
Open the cloned repository as an RStudio project
You can combine steps 1 and 2 using RStudio by creating a new project from the GitHub repository:
File | New Project... | Version Control | Git | Create Project
When you open the project you will get warning messages about packages not being installed.
This is because you need to use the renv
package to reinstate the packages that are used by the project.
-
Install
renv
in that project if it is not already installed -
Use
renv::restore()
to install all the needed packages in the project-specific library:renv::restore()
The analyses are specified by notebooks in the analysis
directory.
You can, if you wish, ignore all the workflowr
aspects and simply run the analysis notebooks locally.
If you wish to take advantage of the features of workflowr
you will need to learn the workflowr
workflow.
See the workflowr
getting started vignette for an introduction.
-
Create a new analysis notebook:
workflowr::wflow_open("analysis/new_notebook_name.Rmd")
-
Build the website locally (either manually or indirectly via
targets
):workflowr::wflow_build()
-
Publish the website online (manually). This will only work if you have
push
authorisation for the GitHub remote repository.workflowr::wflow_publish("analysis/*.Rmd" "A commit message")
-
Add
mathjax = "local"
as an argument toworkflowr::wflow_html
inanalysis/_site.yml
so that the MathJax JavaScript library is bundled with the website indocs/
rather than being loaded from a remote server when the website is viewed. This removes the dependency on the remote server being available. See workflowr/workflowr#211output: workflowr::wflow_html: mathjax: "local"
The renv
package is used to keep track of the installed packages and their versions.
See the renv
collaboration guide or the workflow for synchronising package environments between collaborators.