Skip to content

Commit 4d94081

Browse files
tidypredict 1.0.0 post (#773)
Co-authored-by: Max Kuhn <[email protected]>
1 parent cbbcb94 commit 4d94081

File tree

4 files changed

+254
-0
lines changed

4 files changed

+254
-0
lines changed
Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
---
2+
output: hugodown::hugo_document
3+
4+
slug: tidypredict-1-0-0
5+
title: tidypredict 1.0.0
6+
date: 2025-12-10
7+
author: Emil Hvitfeldt
8+
description: >
9+
tidypredict 1.0.0 brings faster computations for tree-based models, more efficient tree representations, glmnet model support, and a change in how random forests are handled.
10+
11+
photo:
12+
url: https://unsplash.com/photos/brown-leaves-covered-in-snow-on-a-branch-9XKkkeUwBhY
13+
author: Monique Caraballo
14+
15+
# one of: "deep-dive", "learn", "package", "programming", "roundup", or "other"
16+
categories: [package]
17+
tags: [tidymodels, tidypredict, orbital]
18+
---
19+
20+
<!--
21+
TODO:
22+
* [x] Look over / edit the post's title in the yaml
23+
* [x] Edit (or delete) the description; note this appears in the Twitter card
24+
* [x] Pick category and tags (see existing with `hugodown::tidy_show_meta()`)
25+
* [x] Find photo & update yaml metadata
26+
* [x] Create `thumbnail-sq.jpg`; height and width should be equal
27+
* [x] Create `thumbnail-wd.jpg`; width should be >5x height
28+
* [x] `hugodown::use_tidy_thumbnails()`
29+
* [x] Add intro sentence, e.g. the standard tagline for the package
30+
* [x] `usethis::use_tidy_thanks()`
31+
-->
32+
33+
We're tickled pink to announce the release of version 1.0.0 of [tidypredict](https://tidypredict.tidymodels.org/). The main goal of tidypredict is to enable running predictions inside databases. It reads the model, extracts the components needed to calculate the prediction, and then creates an R formula that can be translated into SQL.
34+
35+
You can install them from CRAN with:
36+
37+
```{r, eval = FALSE}
38+
install.packages("tidypredict")
39+
```
40+
41+
This blog post highlights the most important changes in this release, including faster computations for tree-based models, more efficient tree representations, glmnet model support, and a change in how random forests are handled. You can see a full list of changes in the [release notes](https://tidypredict.tidymodels.org/news/index.html#tidypredict-100).
42+
43+
```{r setup}
44+
library(tidypredict)
45+
```
46+
47+
## Improved output for random forest models
48+
49+
The previous version of tidypredict `tidypredict_fit()` would return a list of expressions, one for each tree, when applied to random forest models. This didn't align with what is returned by other types of models. In version 1.0.0, this has been changed to produce a single, combined expression that reflects how predictions should be made.
50+
51+
This is technically a breaking change, but one we believe is worthwhile, as it provides a more consistent output for `tidypredict_fit()` and hides the technical details about how to combine trees from different packages.
52+
53+
## Faster parsing of trees
54+
55+
The parsing of xgboost, partykit, and ranger models should now be substantially faster than before. Examples have been shown to be 10 to 200 times faster. Please note that larger models, more trees, and deeper trees still take some time to parse.
56+
57+
## More efficient tree expressions
58+
59+
All trees, whether they are a single tree or part of a collection of trees, such as in boosted trees or random forests, are encoded as `case_when()` statements by tidypredict. This means that the following tree.
60+
61+
```{r}
62+
model <- partykit::ctree(mpg ~ am + cyl, data = mtcars)
63+
model
64+
```
65+
66+
Would be turned into the following `case_when()` statement.
67+
68+
```r
69+
case_when(
70+
cyl <= 4 ~ 26.6636363636364,
71+
cyl <= 6 & cyl > 4 ~ 19.7428571428571,
72+
cyl > 6 & cyl > 4 ~= 15.1
73+
)
74+
```
75+
76+
With this new update, we have taken advantage of the `.default` argument whenever possible, which should lead to faster predictions, as we no longer need to calculate redundant conditionals.
77+
78+
```{r}
79+
tidypredict_fit(model)
80+
```
81+
82+
## Glmnet support
83+
84+
We now support the glmnet package. This package provides generalized linear models with lasso or elasticnet regularization.
85+
86+
The primary restriction when using a glmnet model with `tidypredict()` is that the model must have been fitted with the `lambda` argument set to a single value.
87+
88+
```{r}
89+
model <- glmnet::glmnet(mtcars[, -1], mtcars$mpg, lambda = 0.01)
90+
91+
tidypredict_fit(model)
92+
```
93+
94+
`glmnet()` computes a collection of models using many sets of penalty values. This can be very efficient, but for tidypredict, we need to predict with a single penalty.
95+
Note how, as we increase the penalty, the extracted expression correctly removes terms with coefficients of `0` instead of leaving them as `(disp * 0)`.
96+
97+
```{r}
98+
model <- glmnet::glmnet(mtcars[, -1], mtcars$mpg, lambda = 1)
99+
100+
tidypredict_fit(model)
101+
```
102+
103+
tidypredict is used as the primary parser for models employed by the [orbital](https://orbital.tidymodels.org/) package. This means that all the changes seen in this post also take effect when using orbital with tidymodels workflows. Such as using `parsnip::linear_reg()` with `engine = "glmnet"`.
104+
105+
## Acknowledgements
106+
107+
A big thank you to all the folks who helped make this release happen: [&#x0040;EmilHvitfeldt](https://github.com/EmilHvitfeldt), and [&#x0040;jeroenjanssens](https://github.com/jeroenjanssens).
Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
---
2+
output: hugodown::hugo_document
3+
4+
slug: tidypredict-1-0-0
5+
title: tidypredict 1.0.0
6+
date: 2025-12-10
7+
author: Emil Hvitfeldt
8+
description: >
9+
tidypredict 1.0.0 brings faster computations for tree-based models, more efficient tree representations, glmnet model support, and a change in how random forests are handled.
10+
11+
photo:
12+
url: https://unsplash.com/photos/brown-leaves-covered-in-snow-on-a-branch-9XKkkeUwBhY
13+
author: Monique Caraballo
14+
15+
# one of: "deep-dive", "learn", "package", "programming", "roundup", or "other"
16+
categories: [package]
17+
tags: [tidymodels, tidypredict, orbital]
18+
rmd_hash: 5e23e8ef618e8397
19+
20+
---
21+
22+
<!--
23+
TODO:
24+
* [x] Look over / edit the post's title in the yaml
25+
* [x] Edit (or delete) the description; note this appears in the Twitter card
26+
* [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html))
27+
* [x] Find photo & update yaml metadata
28+
* [x] Create `thumbnail-sq.jpg`; height and width should be equal
29+
* [x] Create `thumbnail-wd.jpg`; width should be >5x height
30+
* [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)
31+
* [x] Add intro sentence, e.g. the standard tagline for the package
32+
* [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html)
33+
-->
34+
35+
We're tickled pink to announce the release of version 1.0.0 of [tidypredict](https://tidypredict.tidymodels.org/). The main goal of tidypredict is to enable running predictions inside databases. It reads the model, extracts the components needed to calculate the prediction, and then creates an R formula that can be translated into SQL.
36+
37+
You can install them from CRAN with:
38+
39+
<div class="highlight">
40+
41+
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"tidypredict"</span><span class='o'>)</span></span></code></pre>
42+
43+
</div>
44+
45+
This blog post highlights the most important changes in this release, including faster computations for tree-based models, more efficient tree representations, glmnet model support, and a change in how random forests are handled. You can see a full list of changes in the [release notes](https://tidypredict.tidymodels.org/news/index.html#tidypredict-100).
46+
47+
<div class="highlight">
48+
49+
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidypredict.tidymodels.org'>tidypredict</a></span><span class='o'>)</span></span></code></pre>
50+
51+
</div>
52+
53+
## Improved output for random forest models
54+
55+
The previous version of tidypredict [`tidypredict_fit()`](https://tidypredict.tidymodels.org/reference/tidypredict_fit.html) would return a list of expressions, one for each tree, when applied to random forest models. This didn't align with what is returned by other types of models. In version 1.0.0, this has been changed to produce a single, combined expression that reflects how predictions should be made.
56+
57+
This is technically a breaking change, but one we believe is worthwhile, as it provides a more consistent output for [`tidypredict_fit()`](https://tidypredict.tidymodels.org/reference/tidypredict_fit.html) and hides the technical details about how to combine trees from different packages.
58+
59+
## Faster parsing of trees
60+
61+
The parsing of xgboost, partykit, and ranger models should now be substantially faster than before. Examples have been shown to be 10 to 200 times faster. Please note that larger models, more trees, and deeper trees still take some time to parse.
62+
63+
## More efficient tree expressions
64+
65+
All trees, whether they are a single tree or part of a collection of trees, such as in boosted trees or random forests, are encoded as `case_when()` statements by tidypredict. This means that the following tree.
66+
67+
<div class="highlight">
68+
69+
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>model</span> <span class='o'>&lt;-</span> <span class='nf'>partykit</span><span class='nf'>::</span><span class='nf'><a href='https://rdrr.io/pkg/partykit/man/ctree.html'>ctree</a></span><span class='o'>(</span><span class='nv'>mpg</span> <span class='o'>~</span> <span class='nv'>am</span> <span class='o'>+</span> <span class='nv'>cyl</span>, data <span class='o'>=</span> <span class='nv'>mtcars</span><span class='o'>)</span></span>
70+
<span><span class='nv'>model</span></span>
71+
<span><span class='c'>#&gt; </span></span>
72+
<span><span class='c'>#&gt; Model formula:</span></span>
73+
<span><span class='c'>#&gt; mpg ~ am + cyl</span></span>
74+
<span><span class='c'>#&gt; </span></span>
75+
<span><span class='c'>#&gt; Fitted party:</span></span>
76+
<span><span class='c'>#&gt; [1] root</span></span>
77+
<span><span class='c'>#&gt; | [2] cyl &lt;= 4: 26.664 (n = 11, err = 203.4)</span></span>
78+
<span><span class='c'>#&gt; | [3] cyl &gt; 4</span></span>
79+
<span><span class='c'>#&gt; | | [4] cyl &lt;= 6: 19.743 (n = 7, err = 12.7)</span></span>
80+
<span><span class='c'>#&gt; | | [5] cyl &gt; 6: 15.100 (n = 14, err = 85.2)</span></span>
81+
<span><span class='c'>#&gt; </span></span>
82+
<span><span class='c'>#&gt; Number of inner nodes: 2</span></span>
83+
<span><span class='c'>#&gt; Number of terminal nodes: 3</span></span>
84+
<span></span></code></pre>
85+
86+
</div>
87+
88+
Would be turned into the following `case_when()` statement.
89+
90+
``` r
91+
case_when(
92+
cyl <= 4 ~ 26.6636363636364,
93+
cyl <= 6 & cyl > 4 ~ 19.7428571428571,
94+
cyl > 6 & cyl > 4 ~= 15.1
95+
)
96+
```
97+
98+
With this new update, we have taken advantage of the `.default` argument whenever possible, which should lead to faster predictions, as we no longer need to calculate redundant conditionals.
99+
100+
<div class="highlight">
101+
102+
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://tidypredict.tidymodels.org/reference/tidypredict_fit.html'>tidypredict_fit</a></span><span class='o'>(</span><span class='nv'>model</span><span class='o'>)</span></span>
103+
<span><span class='c'>#&gt; case_when(cyl &lt;= 4 ~ 26.6636363636364, cyl &lt;= 6 &amp; cyl &gt; 4 ~ 19.7428571428571, </span></span>
104+
<span><span class='c'>#&gt; .default = 15.1)</span></span>
105+
<span></span></code></pre>
106+
107+
</div>
108+
109+
## Glmnet support
110+
111+
We now support the glmnet package. This package provides generalized linear models with lasso or elasticnet regularization.
112+
113+
The primary restriction when using a glmnet model with `tidypredict()` is that the model must have been fitted with the `lambda` argument set to a single value.
114+
115+
<div class="highlight">
116+
117+
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>model</span> <span class='o'>&lt;-</span> <span class='nf'>glmnet</span><span class='nf'>::</span><span class='nf'><a href='https://glmnet.stanford.edu/reference/glmnet.html'>glmnet</a></span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>[</span>, <span class='o'>-</span><span class='m'>1</span><span class='o'>]</span>, <span class='nv'>mtcars</span><span class='o'>$</span><span class='nv'>mpg</span>, lambda <span class='o'>=</span> <span class='m'>0.01</span><span class='o'>)</span></span>
118+
<span></span>
119+
<span><span class='nf'><a href='https://tidypredict.tidymodels.org/reference/tidypredict_fit.html'>tidypredict_fit</a></span><span class='o'>(</span><span class='nv'>model</span><span class='o'>)</span></span>
120+
<span><span class='c'>#&gt; 13.0081464696679 + (cyl * -0.0773532164346008) + (disp * 0.00969507138358544) + </span></span>
121+
<span><span class='c'>#&gt; (hp * -0.0192462098902709) + (drat * 0.816753237688302) + </span></span>
122+
<span><span class='c'>#&gt; (wt * -3.41564341709663) + (qsec * 0.758580151032383) + (vs * </span></span>
123+
<span><span class='c'>#&gt; 0.277874296242861) + (am * 2.47356523820533) + (gear * 0.645144527527598) + </span></span>
124+
<span><span class='c'>#&gt; (carb * -0.300886812079305)</span></span>
125+
<span></span></code></pre>
126+
127+
</div>
128+
129+
`glmnet()` computes a collection of models using many sets of penalty values. This can be very efficient, but for tidypredict, we need to predict with a single penalty. Note how, as we increase the penalty, the extracted expression correctly removes terms with coefficients of `0` instead of leaving them as `(disp * 0)`.
130+
131+
<div class="highlight">
132+
133+
<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>model</span> <span class='o'>&lt;-</span> <span class='nf'>glmnet</span><span class='nf'>::</span><span class='nf'><a href='https://glmnet.stanford.edu/reference/glmnet.html'>glmnet</a></span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>[</span>, <span class='o'>-</span><span class='m'>1</span><span class='o'>]</span>, <span class='nv'>mtcars</span><span class='o'>$</span><span class='nv'>mpg</span>, lambda <span class='o'>=</span> <span class='m'>1</span><span class='o'>)</span></span>
134+
<span></span>
135+
<span><span class='nf'><a href='https://tidypredict.tidymodels.org/reference/tidypredict_fit.html'>tidypredict_fit</a></span><span class='o'>(</span><span class='nv'>model</span><span class='o'>)</span></span>
136+
<span><span class='c'>#&gt; 35.3137765116027 + (cyl * -0.871451193824228) + (hp * -0.0101173960249783) + </span></span>
137+
<span><span class='c'>#&gt; (wt * -2.59443677687505)</span></span>
138+
<span></span></code></pre>
139+
140+
</div>
141+
142+
tidypredict is used as the primary parser for models employed by the [orbital](https://orbital.tidymodels.org/) package. This means that all the changes seen in this post also take effect when using orbital with tidymodels workflows. Such as using [`parsnip::linear_reg()`](https://parsnip.tidymodels.org/reference/linear_reg.html) with `engine = "glmnet"`.
143+
144+
## Acknowledgements
145+
146+
A big thank you to all the folks who helped make this release happen: [@EmilHvitfeldt](https://github.com/EmilHvitfeldt), and [@jeroenjanssens](https://github.com/jeroenjanssens).
147+
48.9 KB
Loading
80.1 KB
Loading

0 commit comments

Comments
 (0)