Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find high FREX and high lift words for #TidyTuesday Stranger Things dialogue | Julia Silge #77

Open
utterances-bot opened this issue Oct 22, 2022 · 6 comments

Comments

@utterances-bot
Copy link

Find high FREX and high lift words for #TidyTuesday Stranger Things dialogue | Julia Silge

A data science blog

https://juliasilge.com/blog/stranger-things/

Copy link

As a fan of stranger things, I really enjoyed this... very nice blog post and really handy to be able to now tidyup the FREX and Lift measures with tidytext - great work.

Copy link

m-olaide commented Oct 22, 2022

This is another amazing presentation as usual. Thanks for your efforts. I have a couple of questions:

  1. As shown, FREX and LIFT returns different words for each topics. Which of them will you recommend for practical applications?
  2. You mentioned that it's not advisable to "remove stop words before building topic models". However, on the referred link for stm::estimateEffect(), you removed stopwords before building the topic models for that case study. Please advice on the best approach - to remove or not to remove stopwords before building topic models!

Thanks

@juliasilge
Copy link
Owner

@m-olaide Thanks for the great questions!

  • I have found both FREX and lift words to help people understand what a topic is about; I often would report both. If you want to see which would be more useful in your specific situation, I recommend reading the stm vignette and especially the references in there for how FREX and lift are designed and used.
  • For the best quality topics, you typically don't want to remove stop words, as explained in the Schofield & Mimno paper I linked in this post. Sometimes I will still remove them to make a quick-and-dirty topic model that doesn't include those super common words that are used in many or all topics.

Copy link

Kenjd commented Dec 16, 2022

Very thankful for all you share, Julia.
Would you have an idea why this error occurs when trying to run the topic_model for "frex"?
I know it worked originally in your video, but now I get this error when I run the code, and I can't track it down.
Any thoughts are appreciated.
Thanks so much.

Error in match.arg(matrix) :
'arg' should be one of “beta”, “gamma”, “theta”

Copy link

Kenjd commented Dec 16, 2022

Sorry, It's the "Stranger Things", Tidy Tuesday entry

@juliasilge
Copy link
Owner

@Kenjd Hmmmm, it's hard to say here because there aren't a lot of details about where you are getting that error.
Can you create a reprex (a minimal reproducible example) for this? The goal of a reprex is to make it easier for us to recreate your problem so that we can understand it and/or fix it. If you've never heard of a reprex before, you may want to start with the tidyverse.org help page.

Once you have a reprex, I recommend posting on RStudio Community, which is a great forum for getting help with these kinds of analysis questions. Thanks! 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants