|
| 1 | +--- |
| 2 | +output: hugodown::hugo_document |
| 3 | + |
| 4 | +slug: tidypredict-1-0-0 |
| 5 | +title: tidypredict 1.0.0 |
| 6 | +date: 2025-12-10 |
| 7 | +author: Emil Hvitfeldt |
| 8 | +description: > |
| 9 | + tidypredict 1.0.0 brings faster computations for tree-based models, more efficient tree representations, glmnet model support, and a change in how random forests are handled. |
| 10 | +
|
| 11 | +photo: |
| 12 | + url: https://unsplash.com/photos/brown-leaves-covered-in-snow-on-a-branch-9XKkkeUwBhY |
| 13 | + author: Monique Caraballo |
| 14 | + |
| 15 | +# one of: "deep-dive", "learn", "package", "programming", "roundup", or "other" |
| 16 | +categories: [package] |
| 17 | +tags: [tidymodels, tidypredict, orbital] |
| 18 | +rmd_hash: 5e23e8ef618e8397 |
| 19 | + |
| 20 | +--- |
| 21 | + |
| 22 | +<!-- |
| 23 | +TODO: |
| 24 | +* [x] Look over / edit the post's title in the yaml |
| 25 | +* [x] Edit (or delete) the description; note this appears in the Twitter card |
| 26 | +* [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) |
| 27 | +* [x] Find photo & update yaml metadata |
| 28 | +* [x] Create `thumbnail-sq.jpg`; height and width should be equal |
| 29 | +* [x] Create `thumbnail-wd.jpg`; width should be >5x height |
| 30 | +* [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) |
| 31 | +* [x] Add intro sentence, e.g. the standard tagline for the package |
| 32 | +* [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) |
| 33 | +--> |
| 34 | + |
| 35 | +We're tickled pink to announce the release of version 1.0.0 of [tidypredict](https://tidypredict.tidymodels.org/). The main goal of tidypredict is to enable running predictions inside databases. It reads the model, extracts the components needed to calculate the prediction, and then creates an R formula that can be translated into SQL. |
| 36 | + |
| 37 | +You can install them from CRAN with: |
| 38 | + |
| 39 | +<div class="highlight"> |
| 40 | + |
| 41 | +<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"tidypredict"</span><span class='o'>)</span></span></code></pre> |
| 42 | + |
| 43 | +</div> |
| 44 | + |
| 45 | +This blog post highlights the most important changes in this release, including faster computations for tree-based models, more efficient tree representations, glmnet model support, and a change in how random forests are handled. You can see a full list of changes in the [release notes](https://tidypredict.tidymodels.org/news/index.html#tidypredict-100). |
| 46 | + |
| 47 | +<div class="highlight"> |
| 48 | + |
| 49 | +<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidypredict.tidymodels.org'>tidypredict</a></span><span class='o'>)</span></span></code></pre> |
| 50 | + |
| 51 | +</div> |
| 52 | + |
| 53 | +## Improved output for random forest models |
| 54 | + |
| 55 | +The previous version of tidypredict [`tidypredict_fit()`](https://tidypredict.tidymodels.org/reference/tidypredict_fit.html) would return a list of expressions, one for each tree, when applied to random forest models. This didn't align with what is returned by other types of models. In version 1.0.0, this has been changed to produce a single, combined expression that reflects how predictions should be made. |
| 56 | + |
| 57 | +This is technically a breaking change, but one we believe is worthwhile, as it provides a more consistent output for [`tidypredict_fit()`](https://tidypredict.tidymodels.org/reference/tidypredict_fit.html) and hides the technical details about how to combine trees from different packages. |
| 58 | + |
| 59 | +## Faster parsing of trees |
| 60 | + |
| 61 | +The parsing of xgboost, partykit, and ranger models should now be substantially faster than before. Examples have been shown to be 10 to 200 times faster. Please note that larger models, more trees, and deeper trees still take some time to parse. |
| 62 | + |
| 63 | +## More efficient tree expressions |
| 64 | + |
| 65 | +All trees, whether they are a single tree or part of a collection of trees, such as in boosted trees or random forests, are encoded as `case_when()` statements by tidypredict. This means that the following tree. |
| 66 | + |
| 67 | +<div class="highlight"> |
| 68 | + |
| 69 | +<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>model</span> <span class='o'><-</span> <span class='nf'>partykit</span><span class='nf'>::</span><span class='nf'><a href='https://rdrr.io/pkg/partykit/man/ctree.html'>ctree</a></span><span class='o'>(</span><span class='nv'>mpg</span> <span class='o'>~</span> <span class='nv'>am</span> <span class='o'>+</span> <span class='nv'>cyl</span>, data <span class='o'>=</span> <span class='nv'>mtcars</span><span class='o'>)</span></span> |
| 70 | +<span><span class='nv'>model</span></span> |
| 71 | +<span><span class='c'>#> </span></span> |
| 72 | +<span><span class='c'>#> Model formula:</span></span> |
| 73 | +<span><span class='c'>#> mpg ~ am + cyl</span></span> |
| 74 | +<span><span class='c'>#> </span></span> |
| 75 | +<span><span class='c'>#> Fitted party:</span></span> |
| 76 | +<span><span class='c'>#> [1] root</span></span> |
| 77 | +<span><span class='c'>#> | [2] cyl <= 4: 26.664 (n = 11, err = 203.4)</span></span> |
| 78 | +<span><span class='c'>#> | [3] cyl > 4</span></span> |
| 79 | +<span><span class='c'>#> | | [4] cyl <= 6: 19.743 (n = 7, err = 12.7)</span></span> |
| 80 | +<span><span class='c'>#> | | [5] cyl > 6: 15.100 (n = 14, err = 85.2)</span></span> |
| 81 | +<span><span class='c'>#> </span></span> |
| 82 | +<span><span class='c'>#> Number of inner nodes: 2</span></span> |
| 83 | +<span><span class='c'>#> Number of terminal nodes: 3</span></span> |
| 84 | +<span></span></code></pre> |
| 85 | + |
| 86 | +</div> |
| 87 | + |
| 88 | +Would be turned into the following `case_when()` statement. |
| 89 | + |
| 90 | +``` r |
| 91 | +case_when( |
| 92 | + cyl <= 4 ~ 26.6636363636364, |
| 93 | + cyl <= 6 & cyl > 4 ~ 19.7428571428571, |
| 94 | + cyl > 6 & cyl > 4 ~= 15.1 |
| 95 | +) |
| 96 | +``` |
| 97 | + |
| 98 | +With this new update, we have taken advantage of the `.default` argument whenever possible, which should lead to faster predictions, as we no longer need to calculate redundant conditionals. |
| 99 | + |
| 100 | +<div class="highlight"> |
| 101 | + |
| 102 | +<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://tidypredict.tidymodels.org/reference/tidypredict_fit.html'>tidypredict_fit</a></span><span class='o'>(</span><span class='nv'>model</span><span class='o'>)</span></span> |
| 103 | +<span><span class='c'>#> case_when(cyl <= 4 ~ 26.6636363636364, cyl <= 6 & cyl > 4 ~ 19.7428571428571, </span></span> |
| 104 | +<span><span class='c'>#> .default = 15.1)</span></span> |
| 105 | +<span></span></code></pre> |
| 106 | + |
| 107 | +</div> |
| 108 | + |
| 109 | +## Glmnet support |
| 110 | + |
| 111 | +We now support the glmnet package. This package provides generalized linear models with lasso or elasticnet regularization. |
| 112 | + |
| 113 | +The primary restriction when using a glmnet model with `tidypredict()` is that the model must have been fitted with the `lambda` argument set to a single value. |
| 114 | + |
| 115 | +<div class="highlight"> |
| 116 | + |
| 117 | +<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>model</span> <span class='o'><-</span> <span class='nf'>glmnet</span><span class='nf'>::</span><span class='nf'><a href='https://glmnet.stanford.edu/reference/glmnet.html'>glmnet</a></span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>[</span>, <span class='o'>-</span><span class='m'>1</span><span class='o'>]</span>, <span class='nv'>mtcars</span><span class='o'>$</span><span class='nv'>mpg</span>, lambda <span class='o'>=</span> <span class='m'>0.01</span><span class='o'>)</span></span> |
| 118 | +<span></span> |
| 119 | +<span><span class='nf'><a href='https://tidypredict.tidymodels.org/reference/tidypredict_fit.html'>tidypredict_fit</a></span><span class='o'>(</span><span class='nv'>model</span><span class='o'>)</span></span> |
| 120 | +<span><span class='c'>#> 13.0081464696679 + (cyl * -0.0773532164346008) + (disp * 0.00969507138358544) + </span></span> |
| 121 | +<span><span class='c'>#> (hp * -0.0192462098902709) + (drat * 0.816753237688302) + </span></span> |
| 122 | +<span><span class='c'>#> (wt * -3.41564341709663) + (qsec * 0.758580151032383) + (vs * </span></span> |
| 123 | +<span><span class='c'>#> 0.277874296242861) + (am * 2.47356523820533) + (gear * 0.645144527527598) + </span></span> |
| 124 | +<span><span class='c'>#> (carb * -0.300886812079305)</span></span> |
| 125 | +<span></span></code></pre> |
| 126 | + |
| 127 | +</div> |
| 128 | + |
| 129 | +`glmnet()` computes a collection of models using many sets of penalty values. This can be very efficient, but for tidypredict, we need to predict with a single penalty. Note how, as we increase the penalty, the extracted expression correctly removes terms with coefficients of `0` instead of leaving them as `(disp * 0)`. |
| 130 | + |
| 131 | +<div class="highlight"> |
| 132 | + |
| 133 | +<pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>model</span> <span class='o'><-</span> <span class='nf'>glmnet</span><span class='nf'>::</span><span class='nf'><a href='https://glmnet.stanford.edu/reference/glmnet.html'>glmnet</a></span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>[</span>, <span class='o'>-</span><span class='m'>1</span><span class='o'>]</span>, <span class='nv'>mtcars</span><span class='o'>$</span><span class='nv'>mpg</span>, lambda <span class='o'>=</span> <span class='m'>1</span><span class='o'>)</span></span> |
| 134 | +<span></span> |
| 135 | +<span><span class='nf'><a href='https://tidypredict.tidymodels.org/reference/tidypredict_fit.html'>tidypredict_fit</a></span><span class='o'>(</span><span class='nv'>model</span><span class='o'>)</span></span> |
| 136 | +<span><span class='c'>#> 35.3137765116027 + (cyl * -0.871451193824228) + (hp * -0.0101173960249783) + </span></span> |
| 137 | +<span><span class='c'>#> (wt * -2.59443677687505)</span></span> |
| 138 | +<span></span></code></pre> |
| 139 | + |
| 140 | +</div> |
| 141 | + |
| 142 | +tidypredict is used as the primary parser for models employed by the [orbital](https://orbital.tidymodels.org/) package. This means that all the changes seen in this post also take effect when using orbital with tidymodels workflows. Such as using [`parsnip::linear_reg()`](https://parsnip.tidymodels.org/reference/linear_reg.html) with `engine = "glmnet"`. |
| 143 | + |
| 144 | +## Acknowledgements |
| 145 | + |
| 146 | +A big thank you to all the folks who helped make this release happen: [@EmilHvitfeldt](https://github.com/EmilHvitfeldt), and [@jeroenjanssens](https://github.com/jeroenjanssens). |
| 147 | + |
0 commit comments