-
Notifications
You must be signed in to change notification settings - Fork 144
Open
Description
Hello. The expression for lexical diversity is incorrect in Chapter 2: Accessing Text Corpora and Lexical Resources for sections 1.1 Gutenberg Corpus and 3.2 Functions. The expression should read: vocabulary size / total # of words
The error in section 1.1:
>>> for fileid in gutenberg.fileids():
... num_chars = len(gutenberg.raw(fileid)) [1]
... num_words = len(gutenberg.words(fileid))
... num_sents = len(gutenberg.sents(fileid))
... num_vocab = len(set(w.lower() for w in gutenberg.words(fileid)))
... print(round(num_chars/num_words), round(num_words/num_sents), round(num_words/num_vocab), fileid) #here
...
5 25 26 austen-emma.txtThe last number should be a ratio instead between [0, 1].
The error in section 3.2:
>>> from __future__ import division
>>> def lexical_diversity(text):
... return len(text) / len(set(text)) #hereThe function also has the correct expression in the following code block.
I will submit a PR to fix the issue. Thanks for the pleasant read.
Metadata
Metadata
Assignees
Labels
No labels