Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recipes converts strings to factors without being asked to. #836

Closed
SimonCoulombe opened this issue Oct 13, 2021 · 4 comments
Closed

recipes converts strings to factors without being asked to. #836

SimonCoulombe opened this issue Oct 13, 2021 · 4 comments

Comments

@SimonCoulombe
Copy link

SimonCoulombe commented Oct 13, 2021

Hi!
Back in the days, one needed to use recipes::string2factor() to convert strings to factor. Now it is all done without asking.
Is it the expected behaviour?

library(recipes)
library(nycflights13)

myrecipes  <-  recipes::recipe(
  formula = arr_delay ~ carrier + distance,
  data = flights)
prepped_recipe <- recipes::prep(myrecipes, flights)

baked_data  <- recipes::bake(prepped_recipe, new_data = flights)


###flights %>% select(carrier) %>% glimpse
####carrier <chr> "UA", "UA"

### baked_data %>% glimpse
###  carrier   <fct> UA, UA,


Here is my sessioninfo() if it is not the expected behaviour:


> sessioninfo::session_info()
- Session info --------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.6.2 (2019-12-12)
 os       Windows 10 x64              
 system   x86_64, mingw32             
 ui       RStudio                     
 language (EN)                        
 collate  French_Canada.1252          
 ctype    French_Canada.1252          
 tz       America/New_York            
 date     2021-10-13                  

- Packages ------------------------------------------------------------------------------------------------------------
 package        * version    date       lib source        
 assertthat       0.2.1      2019-03-21 [1] CRAN (R 3.6.2)
 class            7.3-15     2019-01-01 [1] CRAN (R 3.6.2)
 cli              2.3.1      2021-02-23 [1] CRAN (R 3.6.2)
 crayon           1.3.4      2017-09-16 [1] CRAN (R 3.6.2)
 DBI              1.1.0      2019-12-15 [1] CRAN (R 3.6.2)
 dplyr          * 1.0.4      2021-02-02 [1] CRAN (R 3.6.3)
 ellipsis         0.3.0      2019-09-20 [1] CRAN (R 3.6.2)
 fansi            0.4.1      2020-01-08 [1] CRAN (R 3.6.2)
 generics         0.1.0      2020-10-31 [1] CRAN (R 3.6.3)
 glue             1.4.2      2020-08-27 [1] CRAN (R 3.6.3)
 gower            0.2.1      2019-05-14 [1] CRAN (R 3.6.1)
 ipred            0.9-9      2019-04-28 [1] CRAN (R 3.6.2)
 lattice          0.20-38    2018-11-04 [1] CRAN (R 3.6.2)
 lava             1.6.6      2019-08-01 [1] CRAN (R 3.6.2)
 lifecycle        1.0.0      2021-02-15 [1] CRAN (R 3.6.2)
 lubridate        1.7.4      2018-04-11 [1] CRAN (R 3.6.2)
 magrittr         1.5        2014-11-22 [1] CRAN (R 3.6.2)
 MASS             7.3-51.4   2019-03-31 [1] CRAN (R 3.6.2)
 Matrix           1.2-18     2019-11-27 [1] CRAN (R 3.6.2)
 nnet             7.3-12     2016-02-02 [1] CRAN (R 3.6.2)
 nycflights13   * 1.0.1      2019-09-16 [1] CRAN (R 3.6.2)
 palmerpenguins * 0.1.0      2020-07-23 [1] CRAN (R 3.6.3)
 pillar           1.4.3      2019-12-20 [1] CRAN (R 3.6.2)
 pkgconfig        2.0.3      2019-09-22 [1] CRAN (R 3.6.2)
 prodlim          2019.11.13 2019-11-17 [1] CRAN (R 3.6.2)
 purrr            0.3.3      2019-10-18 [1] CRAN (R 3.6.2)
 R6               2.4.1      2019-11-12 [1] CRAN (R 3.6.2)
 Rcpp             1.0.3      2019-11-08 [1] CRAN (R 3.6.2)
 recipes        * 0.1.16     2021-04-16 [1] CRAN (R 3.6.3)
 rlang            0.4.10     2020-12-30 [1] CRAN (R 3.6.3)
 rpart            4.1-15     2019-04-12 [1] CRAN (R 3.6.2)
 rstudioapi       0.11       2020-02-07 [1] CRAN (R 3.6.2)
 sessioninfo      1.1.1      2018-11-05 [1] CRAN (R 3.6.2)
 stringi          1.4.6      2020-02-17 [1] CRAN (R 3.6.2)
 stringr          1.4.0      2019-02-10 [1] CRAN (R 3.6.2)
 survival         3.1-8      2019-12-03 [1] CRAN (R 3.6.2)
 tibble           2.1.3      2019-06-06 [1] CRAN (R 3.6.2)
 tidyr            1.0.2      2020-01-24 [1] CRAN (R 3.6.2)
 tidyselect       1.1.0      2020-05-11 [1] CRAN (R 3.6.3)
 timeDate         3043.102   2018-02-21 [1] CRAN (R 3.6.2)
 utf8             1.1.4      2018-05-24 [1] CRAN (R 3.6.2)
 vctrs            0.3.6      2020-12-17 [1] CRAN (R 3.6.3)
 withr            2.4.1      2021-01-26 [1] CRAN (R 3.6.3)

@EmilHvitfeldt
Copy link
Member

prep() defaults to turning strings to factors, you can turn that off by setting strings_as_factors = FALSE in prep()

library(recipes)
library(nycflights13)

myrecipes  <-  recipes::recipe(
  formula = arr_delay ~ carrier + distance,
  data = flights)
prepped_recipe <- recipes::prep(myrecipes, flights, strings_as_factors = FALSE)

recipes::bake(prepped_recipe, new_data = flights)
#> # A tibble: 336,776 × 3
#>    carrier distance arr_delay
#>    <chr>      <dbl>     <dbl>
#>  1 UA          1400        11
#>  2 UA          1416        20
#>  3 AA          1089        33
#>  4 B6          1576       -18
#>  5 DL           762       -25
#>  6 UA           719        12
#>  7 B6          1065        19
#>  8 EV           229       -14
#>  9 B6           944        -8
#> 10 AA           733         8
#> # … with 336,766 more rows

Created on 2021-10-13 by the reprex package (v2.0.1)

@SimonCoulombe
Copy link
Author

SimonCoulombe commented Oct 13, 2021

oh! I didnt see any parameters in recipe()... didnt think to check prep().

thanks for your help!

@juliasilge
Copy link
Member

Thanks so much for your question here! FYI we are planning to move where the strings_as_factors arg lives, from prep() to the recipe itself (see #331, #715, and unfortunately others). We agree that it is not ideal currently.

@github-actions
Copy link

github-actions bot commented Nov 3, 2021

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Nov 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants