Evaluate multiple modeling approaches for #TidyTuesday spam email | Julia Silge #94

utterances-bot · 2023-11-24T19:04:40Z

Evaluate multiple modeling approaches for #TidyTuesday spam email | Julia Silge

A data science blog

alejandrohagan · 2023-11-24T19:04:40Z

Hi Julia!
First thank you so much for your work in this package and this blog. I can't emphasize how much your work has helped me grow my confidence in this area and most importantly made this fun!

Some of my colleagues use python and I can honestly say I am running circles around them because of the work you and the tidymodels team have done here. I'm a huge fan.

Quick question for you -- just as you have workflow_map() for fitting models to resamples and then we can we can view the results. Is there a similar way to to use workflow_map() on the testing data set?

While this may be counter the overall workflow / pipeline that I see in machine learning where we focus and tune the results against resamples of the testing set and then extract the best model and do a last_fit(), for one reason or another we will want to see to how the many models perform against the testing set.

Is there any way to do this with workflow_set() and workflow_map()?

juliasilge · 2023-11-26T22:48:58Z

@alejandrohagan Thank you so much for the kind words! ❤️

There isn't currently an automatic way to use a workflow_set() with the testing set, mainly because we see a workflow_set() as something you do/use doing model development while the testing set is only used for confirming expected performance after you have chosen a final model. If you have a fitted workflow_set(), then you can use extract_workflow_set_result() to get out a specific fitted workflow and then do whatever you want with it, like predict() on the testing set.

NizePetcharat · 2024-07-07T17:40:11Z

Hi Julia,

Thank you for your amazing work on the blog. Your efforts made my learning enjoyable!

I'm curious about the vip() function. When performing multiple modeling and wanting to determine the vip() for all models, should we extract the VI values from fit_resample or last_fit? Additionally, if we need to use it with workflow map fitting, how can we extract the workflow or parsnip from the process? Thank you for your help!

juliasilge · 2024-07-08T02:16:33Z

@NizePetcharat If you want to use variable importance as part of your process of comparing and choosing a model, then I would do that with your resamples, yes. You might check out this Stack Overflow answer where I outline how to approach this.

NarainritKaruna · 2024-07-08T21:12:30Z

Hi,

If it turns out with formula_rf_tune is the best, how can we extract mtry etc for Train and evaluate final model?

juliasilge · 2024-07-08T21:37:31Z

@NarainritKaruna Take a look at how you can use extract_workflow_set_result(): https://workflowsets.tidymodels.org/reference/extract_workflow_set_result.html

NarainritKaruna · 2024-07-08T23:12:15Z

Thanks Julia,

As I usually use select_best(), then I will get mtry and min_n. However, when use extract_workflow_set_result (spam_res,"formula_rf_tune") there is no parameters (mtry & min_n)

Edited I finally got it. Thanks by pulling from "results"

gezelle-d · 2024-11-11T04:55:52Z

Hi Julia, thank you for this tutorial (and all your other tutorials too)! I have data that has originated from four separate studies. Each study examines the effect of a different medication on treatment response. All four studies have the same baseline variables and outcomes variables. Beyond classifying treatment success based on baseline variables, I am wanting to determine which baseline variables are most important in classifying success to each of those four medications. I am planning to do this using a workflow set of different algorithms (xgboost, random forest, svm), finalising the best performing workflow and examining variable importance. I am unsure of how best to compare across the four studies with regard to variable importance. Would it be best to collate all the data and run one model with interaction effects between each medication and each baseline variable, or run four separate models (one for each medication) and compare the importance of variables across the four models? I am unsure whether the former approach would allow me to isolate the importance of specific variables by medication. Would really appreciate your thoughts on this!

juliasilge · 2024-11-13T04:57:50Z

@gezelle-d You might want to take a look at https://www.tmwr.org/explain and especially section 18.4. If it makes the most sense to train one single model for all four types of medication, then you can still understand something about variable importance for the four options. In the analogy of the approach shown in Fig 18.6, the four types of homes would be like your four medications.

I also recommend posting this kind of question on Posit Community, which is a great forum for getting help with these kinds of modeling discussions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate multiple modeling approaches for #TidyTuesday spam email | Julia Silge #94

Evaluate multiple modeling approaches for #TidyTuesday spam email | Julia Silge #94

utterances-bot commented Nov 24, 2023

alejandrohagan commented Nov 24, 2023

juliasilge commented Nov 26, 2023

NizePetcharat commented Jul 7, 2024

juliasilge commented Jul 8, 2024

NarainritKaruna commented Jul 8, 2024

juliasilge commented Jul 8, 2024

NarainritKaruna commented Jul 8, 2024 •

edited

Loading

gezelle-d commented Nov 11, 2024

juliasilge commented Nov 13, 2024

Evaluate multiple modeling approaches for #TidyTuesday spam email | Julia Silge #94

Evaluate multiple modeling approaches for #TidyTuesday spam email | Julia Silge #94

Comments

utterances-bot commented Nov 24, 2023

Evaluate multiple modeling approaches for #TidyTuesday spam email | Julia Silge

alejandrohagan commented Nov 24, 2023

juliasilge commented Nov 26, 2023

NizePetcharat commented Jul 7, 2024

juliasilge commented Jul 8, 2024

NarainritKaruna commented Jul 8, 2024

juliasilge commented Jul 8, 2024

NarainritKaruna commented Jul 8, 2024 • edited Loading

gezelle-d commented Nov 11, 2024

juliasilge commented Nov 13, 2024

NarainritKaruna commented Jul 8, 2024 •

edited

Loading