Skip to content

Commit 8fc263c

Browse files
EmilHvitfeldtsimonpcouchtopepo
authoredFeb 28, 2024
tidymodels 2024 survey post (#687)
* first draft tidymodels 2024 survey post * apply max's changes from code review Co-authored-by: Max Kuhn <mxkuhn@gmail.com> * use colons consistently * add notes on causal inf and ensenbling * re-knit with recent changes * Apply suggestions from code review Co-authored-by: Simon P. Couch <simonpatrickcouch@gmail.com> * Update content/blog/tidymodels-2024-survey/index.Rmd Co-authored-by: Simon P. Couch <simonpatrickcouch@gmail.com> * turn options into subsections * rerender * add survey link --------- Co-authored-by: Simon P. Couch <simonpatrickcouch@gmail.com> Co-authored-by: Max Kuhn <mxkuhn@gmail.com>
1 parent 49b6ae9 commit 8fc263c

File tree

4 files changed

+157
-0
lines changed

4 files changed

+157
-0
lines changed
 
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
---
2+
output: hugodown::hugo_document
3+
4+
slug: tidymodels-2024-survey
5+
title: Take the tidymodels survey for 2024 priorities
6+
date: 2024-02-28
7+
author: Emil Hvitfeldt
8+
description: >
9+
We are conducting our third tidymodels priorities survey. Please give us your
10+
feedback!
11+
12+
photo:
13+
url: https://unsplash.com/photos/white-flowers-under-blue-sky-during-daytime-peN6l68AWaw
14+
author: Aamyr
15+
16+
# one of: "deep-dive", "learn", "package", "programming", or "other"
17+
categories: [other]
18+
tags: [survey,tidymodels]
19+
---
20+
21+
<!--
22+
TODO:
23+
* [x] Look over / edit the post's title in the yaml
24+
* [x] Edit (or delete) the description; note this appears in the Twitter card
25+
* [x] Pick category and tags (see existing with `hugodown::tidy_show_meta()`)
26+
* [x] Find photo & update yaml metadata
27+
* [x] Create `thumbnail-sq.jpg`; height and width should be equal
28+
* [x] Create `thumbnail-wd.jpg`; width should be >5x height
29+
* [x] `hugodown::use_tidy_thumbnails()`
30+
* [x] Add intro sentence, e.g. the standard tagline for the package
31+
* [x] `usethis::use_tidy_thanks()`
32+
-->
33+
34+
At the end of 2021, we created a survey to get community input on how we prioritize our projects. [The results](https://colorado.posit.co/rsc/tidymodels-priorities-2022/) gave us a good sense of which items people were most interested in. Since then we have completed a number of projects:
35+
36+
* **Model fairness metrics** were included in [yardstick 1.3.0](https://yardstick.tidymodels.org/news/index.html#yardstick-130) with [tidymodels.org](https://www.tidymodels.org/) posts coming soon.
37+
* **Spatial analysis models and methods** led to the creation of [spatialsample](https://spatialsample.tidymodels.org/).
38+
* **H2O.ai support** was achieved with the creation of [agua](https://agua.tidymodels.org/).
39+
* **Better serialization tools** are now provided in the [bundle](https://github.com/rstudio/bundle) package.
40+
41+
Almost everything that respondents prioritized highly last year has either been completed or is currently in progress. Our main focus right now is to wrap up survival analysis, which is being done right now with a series of CRAN releases for the affected packages. Most immediately following these releases, we will be working on postprocessing and supervised feature selection. Beyond that, we'd like to once again ask the community for feedback to help us better prioritize features in the coming year.
42+
43+
## Looking toward 2024
44+
45+
**Take a look at [our survey for next priorities](https://conjoint.qualtrics.com/jfe/form/SV_aWw8ocGN5aPgeZE)** and let us know what you think. There are some items we've put "on the menu" but you can write in other items that you are interested in.
46+
47+
The current slate of our possible priorities include:
48+
49+
### Sparse tibbles
50+
51+
Many models benefit from having sparse data, both in execution time and memory usage. We can't take full advantage of this since recipes use tibbles. This project would involve making it so the tibbles used _inside of a recipe_ can hold sparse data. This would not be intended as a general substitute for regular tibbles.
52+
53+
### Causal inference interface
54+
55+
While many common causal inference workflows are already possible with tidymodels, a small set of helper functions could greatly ease the experience of causal modeling in the framework. Specifically, these changes would better accommodate a two-stage modeling approach, using predictions from a propensity model to set case weights for an outcome model.
56+
57+
### Improve chattr
58+
59+
[chattr](https://github.com/mlverse/chattr) is an interface to large language models (LLMs). It enables interaction with the model directly from the RStudio IDE. This task would involve fine-tuning it to give better results when used for tidymodels tasks.
60+
61+
### Cost-sensitive learning API
62+
63+
This feature is another solution for severe class imbalances. The main part of this task is making our approaches to this uniform across models.
64+
65+
### Expand models for stacking ensembles
66+
67+
As of now, the stacks package only supports combining the predictions of member models using a regularized linear model. We could extend the package to allow for combining predictions using any modeling [workflow](https://workflows.tidymodels.org).
68+
69+
### Extend support for spatial ML
70+
71+
[spatialsample](https://spatialsample.tidymodels.org/) introduced a number of spatial resampling methods to tidymodels. More comprehensive support for spatial ML would involve better integrating [spatial metrics](https://www.mm218.dev/posts/2022-08-11-waywiser-010-is-now-on-cran/) into the framework and introducing support for new spatial model types.
72+
73+
### Ordinal regression extension package
74+
75+
Ordinal regression models are specific to classification tasks with a natural ordering to the outcome categories (e.g., low, medium, high, etc.). We could add support for modeling this type of data in a parsnip extension package.
76+
77+
[Check out our survey](https://conjoint.qualtrics.com/jfe/form/SV_aWw8ocGN5aPgeZE) and tell us what your priorities are!
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
---
2+
output: hugodown::hugo_document
3+
4+
slug: tidymodels-2024-survey
5+
title: Take the tidymodels survey for 2024 priorities
6+
date: 2024-02-28
7+
author: Emil Hvitfeldt
8+
description: >
9+
We are conducting our third tidymodels priorities survey. Please give us your
10+
feedback!
11+
12+
photo:
13+
url: https://unsplash.com/photos/white-flowers-under-blue-sky-during-daytime-peN6l68AWaw
14+
author: Aamyr
15+
16+
# one of: "deep-dive", "learn", "package", "programming", or "other"
17+
categories: [other]
18+
tags: [survey,tidymodels]
19+
rmd_hash: c2ba05ee760a40ca
20+
21+
---
22+
23+
<!--
24+
TODO:
25+
* [x] Look over / edit the post's title in the yaml
26+
* [x] Edit (or delete) the description; note this appears in the Twitter card
27+
* [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html))
28+
* [x] Find photo & update yaml metadata
29+
* [x] Create `thumbnail-sq.jpg`; height and width should be equal
30+
* [x] Create `thumbnail-wd.jpg`; width should be >5x height
31+
* [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)
32+
* [x] Add intro sentence, e.g. the standard tagline for the package
33+
* [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html)
34+
-->
35+
36+
At the end of 2021, we created a survey to get community input on how we prioritize our projects. [The results](https://colorado.posit.co/rsc/tidymodels-priorities-2022/) gave us a good sense of which items people were most interested in. Since then we have completed a number of projects:
37+
38+
- **Model fairness metrics** were included in [yardstick 1.3.0](https://yardstick.tidymodels.org/news/index.html#yardstick-130) with [tidymodels.org](https://www.tidymodels.org/) posts coming soon.
39+
- **Spatial analysis models and methods** led to the creation of [spatialsample](https://spatialsample.tidymodels.org/).
40+
- **H2O.ai support** was achieved with the creation of [agua](https://agua.tidymodels.org/).
41+
- **Better serialization tools** are now provided in the [bundle](https://github.com/rstudio/bundle) package.
42+
43+
Almost everything that respondents prioritized highly last year has either been completed or is currently in progress. Our main focus right now is to wrap up survival analysis, which is being done right now with a series of CRAN releases for the affected packages. Most immediately following these releases, we will be working on postprocessing and supervised feature selection. Beyond that, we'd like to once again ask the community for feedback to help us better prioritize features in the coming year.
44+
45+
## Looking toward 2024
46+
47+
**Take a look at [our survey for next priorities](https://conjoint.qualtrics.com/jfe/form/SV_aWw8ocGN5aPgeZE)** and let us know what you think. There are some items we've put "on the menu" but you can write in other items that you are interested in.
48+
49+
The current slate of our possible priorities include:
50+
51+
### Sparse tibbles
52+
53+
Many models benefit from having sparse data, both in execution time and memory usage. We can't take full advantage of this since recipes use tibbles. This project would involve making it so the tibbles used *inside of a recipe* can hold sparse data. This would not be intended as a general substitute for regular tibbles.
54+
55+
### Causal inference interface
56+
57+
While many common causal inference workflows are already possible with tidymodels, a small set of helper functions could greatly ease the experience of causal modeling in the framework. Specifically, these changes would better accommodate a two-stage modeling approach, using predictions from a propensity model to set case weights for an outcome model.
58+
59+
### Improve chattr
60+
61+
[chattr](https://github.com/mlverse/chattr) is an interface to large language models (LLMs). It enables interaction with the model directly from the RStudio IDE. This task would involve fine-tuning it to give better results when used for tidymodels tasks.
62+
63+
### Cost-sensitive learning API
64+
65+
This feature is another solution for severe class imbalances. The main part of this task is making our approaches to this uniform across models.
66+
67+
### Expand models for stacking ensembles
68+
69+
As of now, the stacks package only supports combining the predictions of member models using a regularized linear model. We could extend the package to allow for combining predictions using any modeling [workflow](https://workflows.tidymodels.org).
70+
71+
### Extend support for spatial ML
72+
73+
[spatialsample](https://spatialsample.tidymodels.org/) introduced a number of spatial resampling methods to tidymodels. More comprehensive support for spatial ML would involve better integrating [spatial metrics](https://www.mm218.dev/posts/2022-08-11-waywiser-010-is-now-on-cran/) into the framework and introducing support for new spatial model types.
74+
75+
### Ordinal regression extension package
76+
77+
Ordinal regression models are specific to classification tasks with a natural ordering to the outcome categories (e.g., low, medium, high, etc.). We could add support for modeling this type of data in a parsnip extension package.
78+
79+
[Check out our survey](https://conjoint.qualtrics.com/jfe/form/SV_aWw8ocGN5aPgeZE) and tell us what your priorities are!
80+
Loading
Loading

0 commit comments

Comments
 (0)
Please sign in to comment.