aml4td
diff --git a/‎DESCRIPTION
Lines changed: 2 additions & 2 deletions b/‎DESCRIPTION
Lines changed: 2 additions & 2 deletions
diff --git a/‎R/shiny-polynomial.R
Lines changed: 1 addition & 0 deletions b/‎R/shiny-polynomial.R
Lines changed: 1 addition & 0 deletions
diff --git a/‎RData/deliveries_cubist.RData
12 Bytes b/‎RData/deliveries_cubist.RData
12 Bytes
diff --git a/‎RData/deliveries_lm.RData
-6.68 KB b/‎RData/deliveries_lm.RData
-6.68 KB
diff --git a/‎RData/mlp_rf_mtr.RData
-2 Bytes b/‎RData/mlp_rf_mtr.RData
-2 Bytes
diff --git a/‎_freeze/chapters/categorical-predictors/execute-results/html.json
Lines changed: 2 additions & 2 deletions b/‎_freeze/chapters/categorical-predictors/execute-results/html.json
Lines changed: 2 additions & 2 deletions
diff --git a/‎_freeze/chapters/contributing/execute-results/html.json
Lines changed: 2 additions & 2 deletions b/‎_freeze/chapters/contributing/execute-results/html.json
Lines changed: 2 additions & 2 deletions
diff --git a/‎_freeze/chapters/embeddings/execute-results/html.json
Lines changed: 3 additions & 5 deletions b/‎_freeze/chapters/embeddings/execute-results/html.json
Lines changed: 3 additions & 5 deletions
diff --git a/‎_freeze/chapters/feature-selection/execute-results/html.json
Lines changed: 2 additions & 2 deletions b/‎_freeze/chapters/feature-selection/execute-results/html.json
Lines changed: 2 additions & 2 deletions
diff --git a/‎_freeze/chapters/grid-search/execute-results/html.json
Lines changed: 2 additions & 2 deletions b/‎_freeze/chapters/grid-search/execute-results/html.json
Lines changed: 2 additions & 2 deletions
diff --git a/‎_freeze/chapters/initial-data-splitting/execute-results/html.json
Lines changed: 2 additions & 2 deletions b/‎_freeze/chapters/initial-data-splitting/execute-results/html.json
Lines changed: 2 additions & 2 deletions
diff --git a/‎_freeze/chapters/interactions-nonlinear/execute-results/html.json
Lines changed: 2 additions & 2 deletions b/‎_freeze/chapters/interactions-nonlinear/execute-results/html.json
Lines changed: 2 additions & 2 deletions
diff --git a/‎_freeze/chapters/introduction/execute-results/html.json
Lines changed: 2 additions & 2 deletions b/‎_freeze/chapters/introduction/execute-results/html.json
Lines changed: 2 additions & 2 deletions
diff --git a/‎_freeze/chapters/iterative-search/execute-results/html.json
Lines changed: 2 additions & 2 deletions b/‎_freeze/chapters/iterative-search/execute-results/html.json
Lines changed: 2 additions & 2 deletions
diff --git a/‎_freeze/chapters/missing-data/execute-results/html.json
Lines changed: 2 additions & 2 deletions b/‎_freeze/chapters/missing-data/execute-results/html.json
Lines changed: 2 additions & 2 deletions
diff --git a/‎_freeze/chapters/numeric-predictors/execute-results/html.json
Lines changed: 2 additions & 2 deletions b/‎_freeze/chapters/numeric-predictors/execute-results/html.json
Lines changed: 2 additions & 2 deletions
diff --git a/‎_freeze/chapters/overfitting/execute-results/html.json
Lines changed: 2 additions & 2 deletions b/‎_freeze/chapters/overfitting/execute-results/html.json
Lines changed: 2 additions & 2 deletions
diff --git a/‎_freeze/chapters/resampling/execute-results/html.json
Lines changed: 2 additions & 2 deletions b/‎_freeze/chapters/resampling/execute-results/html.json
Lines changed: 2 additions & 2 deletions
diff --git a/‎_freeze/chapters/whole-game/execute-results/html.json
Lines changed: 2 additions & 2 deletions b/‎_freeze/chapters/whole-game/execute-results/html.json
Lines changed: 2 additions & 2 deletions
diff --git a/‎_freeze/index/execute-results/html.json
Lines changed: 2 additions & 2 deletions b/‎_freeze/index/execute-results/html.json
Lines changed: 2 additions & 2 deletions
diff --git a/‎_quarto.yml
Lines changed: 2 additions & 1 deletion b/‎_quarto.yml
Lines changed: 2 additions & 1 deletion
diff --git a/‎chapters/categorical-predictors.qmd
Lines changed: 16 additions & 10 deletions b/‎chapters/categorical-predictors.qmd
Lines changed: 16 additions & 10 deletions
diff --git a/‎chapters/embeddings.qmd
Lines changed: 10 additions & 10 deletions b/‎chapters/embeddings.qmd
Lines changed: 10 additions & 10 deletions
diff --git a/‎chapters/grid-search.qmd
Lines changed: 2 additions & 1 deletion b/‎chapters/grid-search.qmd
Lines changed: 2 additions & 1 deletion
diff --git a/‎chapters/initial-data-splitting.qmd
Lines changed: 2 additions & 2 deletions b/‎chapters/initial-data-splitting.qmd
Lines changed: 2 additions & 2 deletions
@@ -63,7 +63,6 @@ Imports:
     jsonlite,
     kableExtra,
     kernlab,
-    kknn,
     klaR,
     knitr,
     leaflet,
@@ -139,7 +138,8 @@ Remotes:
     Bioconductor/BiocParallel,
     mixOmicsTeam/mixOmics,
     stevenpawley/colino,
-    JamesHWade/measure
+    JamesHWade/measure,
+    tidymodels/[email protected]
 Config/testthat/edition: 3
 Encoding: UTF-8
 LazyData: true
 
@@ -152,3 +152,4 @@ server <- function(input, output, session) {
 }
 
 app <- shinyApp(ui, server)
+
@@ -1,5 +1,7 @@
 project:
   type: book
+  preview:
+    port: 3763
 
 filters:
   - shinylive
@@ -80,7 +82,6 @@ book:
       - chapters/grid-search.qmd  
       - chapters/iterative-search.qmd
       - chapters/feature-selection.qmd
-      - chapters/comparing-models.qmd
   - part: "Classification"  
   - part: "Regression"  
   - part: "Characterization"  
 
@@ -138,7 +138,7 @@ As a simple example, consider the customer type predictor with categories: "cont
 @tbl-indicators shows how this works for the customer type. The rows depict the possible values in the data, while the columns are the resulting features used in place of the original column. This table uses the most common indicator encoding method called _reference cell_ parameterization (also called a _treatment contrast_). First, a category is chosen as the reference value. In  @tbl-indicators, the first alpha-numeric value is used (`"contract"`), but this is an arbitrary choice. After this, we create separate columns for all possible values except for the reference value. Each of these columns has a value of one when the data matches the column for that value (and is zero otherwise). 
 
 ```{r}
-#| label: indicators
+#| label: tbl-indicators
 #| tbl-cap: "Indicator columns produced from a categorical column using a reference cell parameterization."
 
 customer_types <- 
@@ -154,7 +154,8 @@ customer_types %>%
   rename_all(~ gsub("_", " ", .x)) %>% 
   gt() %>% 
   tab_spanner(label = "Indicator Columns", columns = c(-`customer type`)) %>% 
-  tab_options(table.width = pct(50))
+  tab_options(table.width = pct(50)) |> 
+  tab_style_body(style = cell_text(color = "gray70"), values = 0)
 ```
 
 The rationale for excluding one column is that you can infer the reference value if you know the values of all of the existing indicator columns^[In other words, we know that the vector of indicators (0, 0, 0) must represent the contract customers.]. Including all possible indicator columns embeds a redundancy in the data.  As we will see shortly, data that contain this type of redundancy pose problems for some models like linear regression.
@@ -188,20 +189,24 @@ hot_mod <-
 @tbl-ref-cell-effects shows the encoding in @tbl-indicators and adds columns for a numeric outcome and the intercept term. The outcome column shows the average daily rate (in €). Using a standard estimation procedure (called ordinary least squares), the bottom row of @tbl-ref-cell-effects shows the `r ncol(ref_cell_mod$x)` parameter estimates. Since all of the indicators for the contract customer row are zero, the intercept column estimates the mean value for that level ($\widehat{\beta}_0$ = `r round(coef(ref_cell_mod)[["(Intercept)"]], 1)`). The variable $x_{i1}$ only has an indicator for the "group" customers. Hence, its estimate corresponds to the difference in the average group outcome values (`r mean_adr$rounded[mean_adr$customer_type == "group"]`) minus the effect of the reference cell: `r mean_adr$rounded[mean_adr$customer_type == "contract"]` - `r mean_adr$rounded[mean_adr$customer_type == "group"]`. From this, the resulting estimate ($\widehat{\beta}_1$ = `r round(coef(ref_cell_mod)[["customer_typegroup"]], 2)`) is the effect of the group customers above and beyond the impact of the contract customers. The parameter estimates for the other possible values follow analogous interpretations.
 
 ```{r}
-#| label: ref-cell-effects
+#| label: tbl-ref-cell-effects
 #| tbl-cap: "An example of linear regression parameter estimates corresponding to a reference cell parameterization."
 
 format_encoding(ref_cell_mod) %>% 
-  tab_options(table.width = pct(66))
+  tab_options(table.width = pct(66)) |> 
+  tab_style_body(style = cell_text(color = "gray70"), values = 0) |> 
+  cols_width(info ~ pct(25))
 ```
 
 Another popular method for making indicator variables is called **one-hot encoding** (also known as a cell means encoding). This technique, shown in @tbl-one-hot, makes indicators for all possible levels of the predictor and does _not_ show an intercept column (for reasons described shortly). In this model parameterization, indicators are specific to each value in the data, and the linear regression estimates are the average response values for each customer type.  
 
 ```{r}
-#| label: one-hot
+#| label: tbl-one-hot
 #| tbl-cap: "One-hot encoded indicator variables from a categorical column of data."
 format_encoding(hot_mod) %>% 
-  tab_options(table.width = pct(69))
+  tab_options(table.width = pct(69)) |> 
+  tab_style_body(style = cell_text(color = "gray70"), values = 0) |> 
+  cols_width(info ~ pct(25))
 ```
 
 One-hot encodings are often used in nonlinear models, especially in neural networks and tree-based models. Indicators are not generally required for the latter but can be used^[This is discussed in greater detail for one of the case studies in @sec-reg-summary.]. 
@@ -319,7 +324,7 @@ hash_256 <-
 ```
 
 ```{r}
-#| label: feature-hash
+#| label: tbl-feature-hash
 #| tbl-cap: "Signed indicators for agent via feature hashing."
 
 remake_name <- function(x) {
@@ -377,8 +382,9 @@ bind_rows(hash_top, hash_middle, hash_bottom, hash_summary) %>%
     align = "right",
     columns = everything()
   ) %>% 
-  tab_options(table.width = pct(70))
-
+  tab_options(table.width = pct(70)) |> 
+  tab_style_body(style = cell_text(color = "gray70"), values = " 0") |> 
+  cols_width(Agent ~ pct(25))
 ```
 
 The main downside of this method is that the use of hash values makes it impossible to explain the model. If the tenth feature column is critical, we can't explain why this is the case for new data (since the hash function is practically non-reversible and may include collisions).  This may be fine if the primary objective is prediction rather than interpretation.  When the goal is to optimize predictive performance, then the number of hashing columns to use can be included as a tuning parameter.  The model tuning process can then determine an optimal value of the number of hashing columns.
@@ -542,7 +548,7 @@ The amount of shrinkage was driven mainly by the number of bookings per agent. @
 To reiterate how these values are used for pre-processing this type of predictor, @tbl-effect-estimates shows the linear mixed model analysis results. The numeric column is our primary model's data for representing the agent names. This avoids creating a large number of indicator variables for this predictor. 
 
 ```{r}
-#| label: effect-estimates
+#| label: tbl-effect-estimates
 #| tbl-cap: "Examples of the numeric values that are used in place of each agent's data when effect encodings are used."
 effect_chr <- 
   encoded_results %>%
 
@@ -501,7 +501,7 @@ Note that the first component alone captured `r round(barley_cumulative_variance
 ::: {.figure-content}
 
 ```{shinylive-r}
-#| label: fig-linear-scores
+#| label: shiny-linear-scores
 #| out-width: "80%"
 #| viewerHeight: 550
 #| standalone: true
@@ -519,8 +519,8 @@ source("https://raw.githubusercontent.com/aml4td/website/main/R/shiny-setup.R")
 source("https://raw.githubusercontent.com/aml4td/website/main/R/shiny-linear-scores.R")
 
 app
-`
-``
+```
+
 :::
 
 A visualization of the four new features for different linear embedding methods. The data shown are the validation set results.
@@ -536,7 +536,7 @@ For PCA, it can be very instructive to visualize the loadings for each component
 ::: {.figure-content}
 
 ```{shinylive-r}
-#| label: fig-linear-loadings
+#| label: shiny-linear-loadings
 #| viewerHeight: 550
 #| standalone: true
 
@@ -552,8 +552,8 @@ source(
 )
 
 app
-``
-`
+```
+
 :::
 
 The loadings for the first four components of each linear embedding method as a function of wavelength.
@@ -939,7 +939,7 @@ Take @fig-mds-example(a) as an example. There are ten points in two dimensions (
 #| label: mds-example-computations
 #| include: false
 
-pens <- penguins[complete.cases(penguins),]
+pens <- modeldata::penguins[complete.cases(modeldata::penguins),]
 
 n <- 10
 set.seed(119)
@@ -1274,7 +1274,7 @@ For supervised UMAP, there is an additional weighting parameter (between zero an
 ::: {.figure-content}
 
 ```{shinylive-r}
-#| label: fig-umap
+#| label: shiny-umap
 #| viewerHeight: 550
 #| standalone: true
 
@@ -1288,8 +1288,8 @@ source("https://raw.githubusercontent.com/aml4td/website/main/R/shiny-setup.R")
 source("https://raw.githubusercontent.com/aml4td/website/main/R/shiny-umap.R")
 
 app
-``
-`
+```
+
 :::
 
 A visualization of UMAP results for the barley data using different values for several tuning parameters. The points are the validation set values. 
 
@@ -575,6 +575,7 @@ At each resampling estimate beyond the first $B_{min}$ iterations, the current c
 \end{algorithmic}
 \end{algorithm}
 ```
+
 :::
 
 ::: {.column width="10%"}
@@ -755,7 +756,7 @@ Using the same 10-fold cross-validation scheme, @fig-1d-boost shows the results
 ::: {.figure-content}
 
 ```{r}
-#| label: 1d-boost
+#| label: shiny-1d-boost
 #| echo: false
 #| fig-align: center
 #| out-width: 70%
 
@@ -52,7 +52,7 @@ This chapter will examine how we can appropriately utilize our data. Except in @
 These data, originally published by @ames, are an excellent teaching example. Data were collected for `r format(nrow(ames), big.mark = ",")` houses in Ames, Iowa, via the local assessor's office. A variety of different characteristics of the houses were measured. [Chapter 4](https://www.tmwr.org/ames.html) of @tmwr contains a detailed examination of these data. For illustration, we will focus on a smaller set of predictors, summarized in Tables [-@tbl-ames-numeric] and [-@tbl-ames-categorical]. The geographic locations of the properties are shown in @fig-ames-selection. 
 
 ```{r}
-#| label: ames-numeric
+#| label: tbl-ames-numeric
 #| echo: false
 #| warning: false
 #| message: false
@@ -145,7 +145,7 @@ bind_cols(
 ```
 
 ```{r}
-#| label: ames-categorical
+#| label: tbl-ames-categorical
 #| echo: false
 #| tbl-cap: A summary of categorical predictors in the Ames housing data. 
 #| html-table-processing: none
Original file line number	Diff line number	Diff line change
`@@ -152,3 +152,4 @@ server <- function(input, output, session) {`
`152`	`152`	`}`
`153`	`153`
`154`	`154`	`app <- shinyApp(ui, server)`
	`155`	`+`