binary-Q1L-EN-R.Rmd

---
title: "<img src='www/binary-logo-resize.jpg' width='240'>"
subtitle: "[binary.com](https://github.com/englianhu/binary.com-interview-question) Interview Question I - Lasso, Elastic-Net and Ridge Regression"
author: "[®γσ, Lian Hu](https://englianhu.github.io/) <img src='www/RYO.jpg' width='24'> <img src='www/RYU.jpg' width='24'> <img src='www/ENG.jpg' width='24'>®"
date: "`r lubridate::today('Asia/Tokyo')`"
output:
  html_document: 
    number_sections: yes
    toc: yes
    toc_depth: 4
    toc_float:
      collapsed: yes
      smooth_scroll: yes
    code_folding: hide
---

```{r warning=FALSE}
suppressPackageStartupMessages(library('BBmisc'))
#'@ suppressPackageStartupMessages(library('rmsfuns'))

pkgs <- c('knitr', 'kableExtra', 'tint', 'devtools', 'lubridate', 'data.table', 'feather', 'quantmod', 'tidyquant', 'tidyr', 'timetk', 'plyr', 'stringr', 'magrittr', 'dplyr', 'tidyverse', 'memoise', 'htmltools', 'formattable', 'zoo', 'forecast', 'glmnet')

suppressAll(lib(pkgs))
#'@ load_pkg(pkgs)

funs <- c('glmPrice.R')
l_ply(funs, function(x) source(paste0('./function/', x)))
.cl = FALSE

options(warn = -1)#, digits.secs = 6)
rm(pkgs, funs)
```

# Introduction

Due to I don't know if my initial models in these studies are effective or not. Today, I am trying to recall the `Lasso`, `Elastic-Net` and also `Ridge` models and compare with ARIMA model. That was the initial models I built in order to predict the stock price of [`LAD`](https://finance.yahoo.com/quote/LAD/) since few years ago. 

[Glmnet Vignette](https://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html) provides examples for statistical modelling while [Stock Prediction with R glmnet and tm packages](https://www.kaggle.com/captcalculator/stock-prediction-with-r-glmnet-and-tm-packages) applied `Lasso` regression for stock price prediction.

- [热门数据挖掘模型应用入门（一）: LASSO 回归](https://cosx.org/2016/10/data-mining-1-lasso/)

![*Source :* [*shinyApp*](https://beta.rstudioconnect.com/content/2367/)](www/20170113_104005.gif)

You are feel free to browse over [*shinyApp*](https://beta.rstudioconnect.com/content/2367/).

# Data

```{r warning=FALSE}
mbase <- readRDS('data/LAD_Full.rds') %>% 
  tk_tbl

mbase %>% 
  kable(caption = 'OHLCVA of LAD') %>% 
  kable_styling(bootstrap_options = c('striped', 'hover', 'condensed', 'responsive')) %>%
  scroll_box(width = '100%', height = '400px')
```

# Modelling

## Lasso, Elastic-Net and Ridge

Below I tried to model Lasso, Elastic-Net and also Ridge models.

```{r warning=FALSE, eval=FALSE}
# ----------- eval=FALSE --------------
#'@ pr <- glmPrice(mbase, fordate = xts::last(mbase$Date)+days(1))
timeID <- mbase$index
timeID0 <- timeID %>% xts::first() %m+% years(1)
timeID %<>% .[. > timeID0]
fit <- list()
prd <- list()
prc <- vector('list', length(timeID))

for(d in (1:length(timeID))) {
  for(i in 0:10) {
    smp <- mbase %>% dplyr::filter(index <= timeID[d])
    x <- as.matrix(smp[c('LAD.Open', 'LAD.High', 'LAD.Low')])
    y <- as.matrix(smp['LAD.Close'])
    #'@ y <- as.matrix(lead(smp$LAD.Close))
    fit[[i+1]] <- cv.glmnet(x = x, y = y, alpha = i/10, 
                            type.measure = 'mse', maxit = 100)
    prd[[i+1]] <- predict(fit[[i+1]], newx = tail(x, 1), n.ahead = 1, 
                          type='link', s='lambda.1se')
  }
  prc[[d]] <- prd %>% cbind %>% 
    data.frame(index = tail(smp$index, 1), alpha = 0:10) %>% 
    .[, 1:3] %>% 
    tbl_df %>% 
    unnest %>% 
    dplyr::rename(Price = '.')
}

```

```{r warning=FALSE}
dr <- dir('C:/Users/scibr/Documents/GitHub/englianhu/data (binary.com-interview-question excludes fx subfolder)', pattern = '[0-9]{8}')
md <- llply(dr, function(x) {
    pth <- paste0('C:/Users/scibr/Documents/GitHub/englianhu/data (binary.com-interview-question excludes fx subfolder)/', x, '/fitgaum.best.rds')
    readRDS(pth) %>% 
      head(1)
  })
names(md) <- dr

#'@ md$`20160201`[[1]]$yhat[[1]] %>% tail
```

## Auto ARIMA

[Transforming subsets of data in R with by, ddply and data.table](https://magesblog.com/post/2012-06-12-transforming-subsets-of-data-in-r-with/) compares the efficiency of `transform` dataset.

Here I try to use `auto.arima` to compare with the models states at previous section.

```{r warning=FALSE, message=FALSE}
tmID <- mbase$index
tmID %<>% .[. > ymd('2015-01-02')]

at.ar <- llply(tmID, function(dt) {
  smp <- mbase[c('index', 'LAD.Close')] %>% 
    dplyr::filter((index < dt) & (index >= dt %m-% years(1)))
  dat <- smp %>% 
    tk_xts %>% 
    auto.arima %>% 
    forecast(h=1)
  data.frame(index = xts::last(smp$index), Price.T1 = dat)
  }) %>% 
  bind_rows %>% 
  tbl_df
```

# MSE

Here I compare the Lasso/Elastic-net/Ridge models with Auto.Arima model.

```{r warning=FALSE}
dr <- dir('C:/Users/scibr/Documents/GitHub/englianhu/data (binary.com-interview-question excludes fx subfolder)', pattern = '[0-9]{8}')
comp <- llply(dr, function(x) {
    pth <- paste0('C:/Users/scibr/Documents/GitHub/englianhu/data (binary.com-interview-question excludes fx subfolder)/', x, '/fitgaum.mse1.rds')
    readRDS(pth) %>% 
      head(1) %>% 
      dplyr::rename(Submodel = model, Model = .id)
  })
names(comp) <- dr

comp %<>% 
    ldply %>% 
    tbl_df %>% 
    dplyr::rename(index = .id, MSE.1 = mse) %>% 
    mutate(index = ymd(index), 
           Model = factor(Model), 
           Submodel = factor(Submodel))
comp %>% 
  kable(caption = 'MSE of daily Opened and Closed Transaction Orders') %>% 
  kable_styling(bootstrap_options = c('striped', 'hover', 'condensed', 'responsive')) %>%
  scroll_box(width = '100%', height = '400px')

comp2 <- join_all(list(mbase[c('index', 'LAD.Close')], 
                      at.ar[c('index', 'Price.T1.Point.Forecast')], 
                      comp)) %>% 
  na.omit %>% 
  tbl_df

## MSE
comp2 %>% mutate(
    MSE.AR = mean((Price.T1.Point.Forecast - LAD.Close)^2), 
    MSE.Lasso = mean(MSE.1)) %>% 
  dplyr::select(MSE.AR, MSE.Lasso) %>% 
  unique %>% 
  kable(caption = 'MSE of daily Opened and Closed Transaction Orders') %>% 
  kable_styling(bootstrap_options = c('striped', 'hover', 'condensed', 'responsive'))
```

# Conclusion

From above table, we know that `r comp2 %>% mutate(
    MSE.AR = mean((Price.T1.Point.Forecast - LAD.Close)^2), 
    MSE.Lasso = mean(MSE.1)) %>% 
  dplyr::select(MSE.AR, MSE.Lasso) %>% 
  unique %>% unlist %>% which.min %>% names` model is better model.

```{r option, echo = FALSE}
## Set options back to original options
options(warn = 0)
```

# Appendix

## Documenting File Creation 

It's useful to record some information about how your file was created.

- File creation date: 2018-10-20
- File latest updated date: `r today('Asia/Tokyo')`
- `r R.version.string`
- R version (short form): `r getRversion()`
- [**rmarkdown** package](https://github.com/rstudio/rmarkdown) version: `r packageVersion('rmarkdown')`
- File version: 1.0.1
- Author Profile: [®γσ, Eng Lian Hu](https://beta.rstudioconnect.com/content/4352/)
- GitHub: [Source Code](https://github.com/englianhu/binary.com-interview-question)
- Additional session information:

```{r info, echo=FALSE, warning=FALSE, results='asis'}
suppressMessages(require('dplyr', quietly = TRUE))
suppressMessages(require('formattable', quietly = TRUE))
suppressMessages(require('knitr', quietly = TRUE))
suppressMessages(require('kableExtra', quietly = TRUE))

sys1 <- devtools::session_info()$platform %>% 
  unlist %>% data.frame(Category = names(.), session_info = .)
rownames(sys1) <- NULL

#sys1 %<>% rbind(., data.frame(
#  Category = 'Current time', 
#  session_info = paste(as.character(lubridate::now('Asia/Tokyo')), 'JST'))) %>% 
#  dplyr::filter(Category != 'os')

sys2 <- data.frame(Sys.info()) %>% mutate(Category = rownames(.)) %>% .[2:1]
names(sys2)[2] <- c('Sys.info')
rownames(sys2) <- NULL

if (nrow(sys1) == 7 & nrow(sys2) == 8) {
  sys1 %<>% rbind(., data.frame(
  Category = 'Current time', 
  session_info = paste(as.character(lubridate::now('Asia/Tokyo')), 'JST')))
} else {
  sys2 %<>% rbind(., data.frame(
  Category = 'Current time', 
  Sys.info = paste(as.character(lubridate::now('Asia/Tokyo')), 'JST')))
}

cbind(sys1, sys2) %>% 
  kable(caption = 'Additional session information:') %>% 
  kable_styling(bootstrap_options = c('striped', 'hover', 'condensed', 'responsive'))

rm(sys1, sys2)
```

## Reference

01. [**Stock Market Forecasting Using LASSO Linear Regression Model** *by Sanjiban Sekhar Roy, Dishant Mital, Avik Basu, Ajith Abraham (2015)*](https://raw.githubusercontent.com/englianhu/binary.com-interview-question/master/reference/Stock%20Market%20Forecasting%20Using%20LASSO%20Linear%20Regression%20Model.pdf)<img src='www/hot.jpg' width='20'>
02. [**Using LASSO from lars (or glmnet) package in R for variable selection** *by Juancentro (2014)*](http://stats.stackexchange.com/questions/58531/using-lasso-from-lars-or-glmnet-package-in-r-for-variable-selection?answertab=votes#tab-top)
03. [**Difference between glmnet() and cv.glmnet() in R?** *by Amrita Sawant (2015)*](https://stackoverflow.com/questions/29311323/difference-between-glmnet-and-cv-glmnet-in-r?answertab=votes#tab-top)
04. [**Testing Kelly Criterion and Optimal f in R** *by Roy Wei (2012)*](https://alphaism.wordpress.com/2012/04/13/testing-kelly-criterion-and-optimal-f-in-r) <img src='www/hot.jpg' width='20'>
05. [**Portfolio Optimization and Monte Carlo Simulation** *by Magnus Erik Hvass Pedersen (2014)*](https://raw.githubusercontent.com/scibrokes/kelly-criterion/master/references/Portfolio%20Optimization%20and%20Monte%20Carlo%20Simulation.pdf) <img src='www/hot.jpg' width='20'>
06. [**Glmnet Vignette** *by Trevor Hastie and Junyang Qian (2014)*](https://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html)
07. [**lasso怎么用算法实现？** *by shuaihuang (2010)*](https://d.cosx.org/d/101533-101533/5)
08. [**The Sparse Matrix and {glmnet}** *by Manuel Amunategui (2014)*](http://amunategui.github.io/sparse-matrix-glmnet/)
09. [**Regularization and Variable Selection via the Elastic Net** *by Hui Zou and Trevor Hastie*](https://raw.githubusercontent.com/englianhu/binary.com-interview-question/master/reference/Regularization%20and%20Variable%20Selection%20via%20the%20Elastic%20Net.pdf)
10. [LASSO, Ridge, and Elastic Net](http://www4.stat.ncsu.edu/~post/josh/LASSO_Ridge_Elastic_Net_-_Examples.html) <img src='www/hot.jpg' width='20'>
11. [**热门数据挖掘模型应用入门（一）: LASSO回归** *by 侯澄钧 (2016)*](https://cosx.org/2016/10/data-mining-1-lasso)
12. [The Lasso Page](http://statweb.stanford.edu/~tibs/lasso.html)
13. [Shrinkage Estimation of Linear Regression Models with GARCH Errors](Shrinkag%20Estimation%20of%20Linear%20Regression%20Models%20with%20GARCH%20Errors.pdf)
14. [Iteratively Reweighted Adaptive Lasso for Conditional Heteroscedastic Time Series with Applications to AR-ARCH Type Processes](Iteratively%20Reweighted%20Adaptive%20Lasso%20for%20Conditional%20Heteroscedastic%20Time%20Series with%20Applications%20to%20AR-ARCH%20Type%20Processes.pdf)

---

<span style='color:RoyalBlue'>**Powered by - Copyright® Intellectual Property Rights of [<img src='www/scb_logo.jpg' width='64'>®](http://www.scibrokes.com)個人の経営企業**</span>