Predict the magnitude of #TidyTuesday tornadoes with effect encoding and xgboost | Julia Silge #92

utterances-bot · 2023-06-15T13:47:44Z

Predict the magnitude of #TidyTuesday tornadoes with effect encoding and xgboost | Julia Silge

A data science blog

msahil515 · 2023-06-15T13:47:44Z

Hello Julia,

Thanks for another informative post. Your method of handling high cardinality categorical variables through likelihood encoding was interesting.

I noticed that 'st' variable is a top contributor to the model. However, the encoding adds a degree of abstraction. I am trying to interpret effects of specific states on the tornado magnitude. Can we somehow map these encoded 'st' values back to the original states for more intuitive interpretation? Could referring to encoded st values themselves provide a straightforward way to understand their effects?

Moreover, I am pondering if PDP could be used to further explore the effects of each state.

Thanks again for your insightful post. Looking forward to more of it.

juliasilge · 2023-06-15T15:43:42Z

@msahil515 Yes, you can get out the values associated with each value for st by tidying the recipe. Check out how I do that in this similar post -- look for tidy().

You could also use a partial dependence profile to examine the results more. I like using model_profile() from DALEX, as shown here.

smithhelen · 2023-06-27T22:13:47Z

Hello Julia

I would like to use a different encoding method for categorical variables, similar to the internal pca ordering method used by ranger (adapted from Coppersmith). It is target based and so needs to be done on each fold, rather than prior to splitting the data. How would I be able to incorporate this into a recipe step please?

Many thanks!

juliasilge · 2023-07-02T21:39:54Z

@smithhelen Take a look at this article on how to create your own recipe step.

robsonpro · 2023-07-08T19:31:01Z

Hello Julia, congrats for your impressive work.

I have a question about the grid in tune_race_anova(). The grid is the total number of combinations of the levels of trees, min_n, and mtry? Or for each of these hyperparameters, it will be considered 15 levels and the total grid will have 15^3?

Thank you.

juliasilge · 2023-07-08T22:19:14Z

@robsonpro Ah no, if you set grid = 15, the way it works is to choose a grid_max_entropy() with 15 elements total. You can read more about this kind of behavior in this chapter, and especially this section. Notice where it says:

The default design used by the tune package is the maximum entropy design.

You can provide your own grid in that argument, using any of the kinds of grid specifications outlined in that chapter. If you use the default or do something like grid = 10, it will do a maximum entropy grid with 10 elements.

robsonpro · 2023-07-11T10:42:29Z

Thank you so much for your attention and explanation, @juliasilge. I catch that now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predict the magnitude of #TidyTuesday tornadoes with effect encoding and xgboost | Julia Silge #92

Predict the magnitude of #TidyTuesday tornadoes with effect encoding and xgboost | Julia Silge #92

utterances-bot commented Jun 15, 2023

msahil515 commented Jun 15, 2023

juliasilge commented Jun 15, 2023

smithhelen commented Jun 27, 2023

juliasilge commented Jul 2, 2023

robsonpro commented Jul 8, 2023

juliasilge commented Jul 8, 2023

robsonpro commented Jul 11, 2023

Predict the magnitude of #TidyTuesday tornadoes with effect encoding and xgboost | Julia Silge #92

Predict the magnitude of #TidyTuesday tornadoes with effect encoding and xgboost | Julia Silge #92

Comments

utterances-bot commented Jun 15, 2023

Predict the magnitude of #TidyTuesday tornadoes with effect encoding and xgboost | Julia Silge

msahil515 commented Jun 15, 2023

juliasilge commented Jun 15, 2023

smithhelen commented Jun 27, 2023

juliasilge commented Jul 2, 2023

robsonpro commented Jul 8, 2023

juliasilge commented Jul 8, 2023

robsonpro commented Jul 11, 2023