Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong explanation of Fig. 3.1. #111

Closed
deyanyosifov opened this issue Jan 22, 2024 · 3 comments
Closed

Wrong explanation of Fig. 3.1. #111

deyanyosifov opened this issue Jan 22, 2024 · 3 comments

Comments

@deyanyosifov
Copy link

deyanyosifov commented Jan 22, 2024

"There are very long tails to the right for these novels (those extremely rare words!) that we have not shown in these plots."
In fact, the extremely rare words have low n/total and they are at the leftmost side of the histogram. There are a lot of unique rare words that were used once or twice in a book, that's why the first column of the histogram is so high. The common words are not so many, they have high n/total and are to the right. The most common words ("a", "the", prepositions) are not even on the histograms because the x-axis has been limited to the right. For "the" in Mansfield Park n/total = 0.0386751 which is larger that 0.0009 that is the threshold of the x-axis.

@juliasilge
Copy link
Collaborator

Wow @deyanyosifov this is a typo that has made it through 5 years of corrections from users, multiple rounds of copyediting, etc. Congratulations! 😆

I'll get this corrected and submitted to the errata.

@juliasilge
Copy link
Collaborator

Ah, this was already reported to the errata and I approved it! 🙈

https://www.oreilly.com/catalog/errata.csp?isbn=9781491981658

@juliasilge
Copy link
Collaborator

I fixed this in #112 and the new version is now deployed at https://www.tidytextmining.com/tfidf

Thanks again @deyanyosifov!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants