- Code Tested in: R version 3.6.3 on Ubuntu 18.04.4 LTS
- Prepared by: Tony Culos ([email protected])
We introduce the immunological Elastic Net (iEN) which integrates mechanistic immunological knowledge into a machine learning framework. Here we provide code for the application of iEN models and its optimization given a set of hyperparameter values. For a more comprehensive description of this method please see Integration of Mechanistic Immunological Knowledge into a Machine Learning Pipeline Improves Predictions
.
Installation of the 'immunological-EN' can be accomplished easiest through the terminal. All libraries dependent for the optimization and fitting of iEN models must be installed prior to building and installing the package from the source files. To install all dependencies please run this command prior to installation install.packages(c('pROC', 'Metrics', 'Matrix', 'glmnet', 'knitr'))
See DESCRIPTION
file for a full list of imported and suggested packages.
- Download the entire repository
- Run
install.packages(path_to_file, repos = NULL, type="source")
where the file isiEN_0.99.0.tar.gz
iEN
package should now be available in R via thelibrary('iEN')
command
- Download the entire repository and remove
iEN_0.99.0.tar.gz
file - In the terminal navigate to the previously mentioned folder location and run the following command
R CMD Build immunological-EN-master
If different, adapt this command to accommodate whichever folder name was used - Next install the
.tar.gz
file which was builtR CMD INSTALL iEN_0.99.0.tar.gz
For full documentation see iEN-Manual.pdf
, here we will summarize the main function of the package wich optimizes an iEN model via cross-validated grid search while also producing out-of-sample predictions on held out folds.
cv_iEN
optimizes an iEN model via K-fold cross validation gridsearch and returns out-of-sample predictions and the associated model meta data. it does so with the following parameters
- X - Input matrix
- Y - Response variable
- foldid - vector indicating fold membership for each observation, used during K-fold cross validation
- alphaGrid - vector of alpha values
- nlambda - number of lambda values to generate for each cross validation fold
- lambdas - vector of lambdas when specific values wish to be tested (recommended that this is set to NULL)
- priors - vector of continuous values indicating features that are consistent with canonical prior knowledge
- ncores - number of cores used during the model building process
- eval - evaluation methods used to optimize models (we suggest "RMSE" for continuous response variables, and "ROCAUC" for classification)
- intercept - indicator for inclusion of the intercept for the regression model
- standardize - indicator for standardization of X prior to model fitting
- center - indicator for centering during standardizing of X