-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate multiple modeling approaches for #TidyTuesday spam email | Julia Silge #94
Comments
Hi Julia! Some of my colleagues use python and I can honestly say I am running circles around them because of the work you and the tidymodels team have done here. I'm a huge fan. Quick question for you -- just as you have workflow_map() for fitting models to resamples and then we can we can view the results. Is there a similar way to to use workflow_map() on the testing data set? While this may be counter the overall workflow / pipeline that I see in machine learning where we focus and tune the results against resamples of the testing set and then extract the best model and do a last_fit(), for one reason or another we will want to see to how the many models perform against the testing set. Is there any way to do this with |
@alejandrohagan Thank you so much for the kind words! ❤️ There isn't currently an automatic way to use a |
Hi Julia, Thank you for your amazing work on the blog. Your efforts made my learning enjoyable! I'm curious about the vip() function. When performing multiple modeling and wanting to determine the vip() for all models, should we extract the VI values from fit_resample or last_fit? Additionally, if we need to use it with workflow map fitting, how can we extract the workflow or parsnip from the process? Thank you for your help! |
@NizePetcharat If you want to use variable importance as part of your process of comparing and choosing a model, then I would do that with your resamples, yes. You might check out this Stack Overflow answer where I outline how to approach this. |
Hi, If it turns out with formula_rf_tune is the best, how can we extract mtry etc for Train and evaluate final model? |
@NarainritKaruna Take a look at how you can use |
Thanks Julia, As I usually use select_best(), then I will get mtry and min_n. However, when use extract_workflow_set_result (spam_res,"formula_rf_tune") there is no parameters (mtry & min_n) Edited I finally got it. Thanks by pulling from "results" |
Hi Julia, thank you for this tutorial (and all your other tutorials too)! I have data that has originated from four separate studies. Each study examines the effect of a different medication on treatment response. All four studies have the same baseline variables and outcomes variables. Beyond classifying treatment success based on baseline variables, I am wanting to determine which baseline variables are most important in classifying success to each of those four medications. I am planning to do this using a workflow set of different algorithms (xgboost, random forest, svm), finalising the best performing workflow and examining variable importance. I am unsure of how best to compare across the four studies with regard to variable importance. Would it be best to collate all the data and run one model with interaction effects between each medication and each baseline variable, or run four separate models (one for each medication) and compare the importance of variables across the four models? I am unsure whether the former approach would allow me to isolate the importance of specific variables by medication. Would really appreciate your thoughts on this! |
@gezelle-d You might want to take a look at https://www.tmwr.org/explain and especially section 18.4. If it makes the most sense to train one single model for all four types of medication, then you can still understand something about variable importance for the four options. In the analogy of the approach shown in Fig 18.6, the four types of homes would be like your four medications. I also recommend posting this kind of question on Posit Community, which is a great forum for getting help with these kinds of modeling discussions. |
Evaluate multiple modeling approaches for #TidyTuesday spam email | Julia Silge
A data science blog
https://juliasilge.com/blog/spam-email/
The text was updated successfully, but these errors were encountered: