High cardinality predictors for #TidyTuesday museums in the UK | Julia Silge #79

utterances-bot · 2022-11-27T18:48:42Z

High cardinality predictors for #TidyTuesday museums in the UK | Julia Silge

A data science blog

cedricbatailler · 2022-11-27T18:48:43Z

Once again, an awesome episode! Thanks a lot Julia!

One question I have would be if you had any recommended reading on how to pick a model to start playing with? I guess that a lot must come from practicing and becoming familiar with the different models, but do you know a nice place to start to gain expertise on whether one should start with an xgboost model or an svm one? Of course, I'm assuming that model selection does matter, but I had this impression from the screencasts!

Once again thanks for the videos, I'm always looking forward for the next one!

juliasilge · 2022-11-28T01:22:17Z

@cedricbatailler I think a good place to start could be ISLR or Applied Predictive Modeling. I don't think either is really focused on software (how) but they are great for learning (what, why).

viv-analytics · 2022-12-05T17:04:00Z

Again, a fantastic screencast. Thank you, Julia.

Since, model interpretability is gaining a lot of attention, it would be great, if you could showcase in a next episode the capabilities of tidymodels and other packages supporting e.g. SHAP and Shapley values for local and global feature explanations.

juliasilge · 2022-12-05T18:59:41Z

@viv-analytics If you'd like to look at how to approach that with tidymodels, you can check this chapter of Tidy Modeling with R.

zhaoliang0302 · 2022-12-28T09:06:11Z

Agree with @viv-analytics. Model interpretability for black box model is essential. I tried SHAPforxgboost, fastshap, and shapviz packages and I don't know how to combine these functions with tidymodels objects. Some unexpected errors always occur.

amin0511ss · 2023-03-17T18:19:24Z

Hi Julia and thank you for your helpful and informative posts. I embedded a categorical variable with 790 cardinality) using both step_lencode_mixed and step_lencode_bayes for a unbalanced dataset (98.6%/1.4%). I noticed that besides the "..new" level added, there are also other new levels added as follows. for some original levels (e.g., "HRG"), after embedding, there is the original level ("HRG") and a new one ("HRGDisposition"). "Disposition" is the name of the target variable. Is this due to the combination of high cardinality and extremely unbalanced data? The obvious problem is the fact that these new levels (e.g., "HRGDisposition") is not going to be in the "new data" and all of these with be assigned to "..new" level. Am i doing something wrong here?

juliasilge · 2023-03-18T02:12:05Z

@amin0511ss Hmmmm, I wouldn't think so; that doesn't make a ton of sense to me. Can you create a reprex (a minimal reproducible example) for this? It can be tough to create a reprex for something super specific like this, but a reprex can make it easier for us to recreate your problem so that we can understand it and/or fix it. If you've never heard of a reprex before, you may want to start with the tidyverse.org help page.

Once you have a reprex, I recommend posting on Posit Community, which is a great forum for getting help with these kinds of modeling questions. Thanks! 🙌

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High cardinality predictors for #TidyTuesday museums in the UK | Julia Silge #79

High cardinality predictors for #TidyTuesday museums in the UK | Julia Silge #79

utterances-bot commented Nov 27, 2022

cedricbatailler commented Nov 27, 2022

juliasilge commented Nov 28, 2022

viv-analytics commented Dec 5, 2022

juliasilge commented Dec 5, 2022

zhaoliang0302 commented Dec 28, 2022

amin0511ss commented Mar 17, 2023

juliasilge commented Mar 18, 2023

High cardinality predictors for #TidyTuesday museums in the UK | Julia Silge #79

High cardinality predictors for #TidyTuesday museums in the UK | Julia Silge #79

Comments

utterances-bot commented Nov 27, 2022

High cardinality predictors for #TidyTuesday museums in the UK | Julia Silge

cedricbatailler commented Nov 27, 2022

juliasilge commented Nov 28, 2022

viv-analytics commented Dec 5, 2022

juliasilge commented Dec 5, 2022

zhaoliang0302 commented Dec 28, 2022

amin0511ss commented Mar 17, 2023

juliasilge commented Mar 18, 2023