Skip to content

Incorrect expression for lexical diversity in Chapter 2 #268

@weezymatt

Description

@weezymatt

Hello. The expression for lexical diversity is incorrect in Chapter 2: Accessing Text Corpora and Lexical Resources for sections 1.1 Gutenberg Corpus and 3.2 Functions. The expression should read: vocabulary size / total # of words

The error in section 1.1:

>>> for fileid in gutenberg.fileids():
...     num_chars = len(gutenberg.raw(fileid)) [1]
...     num_words = len(gutenberg.words(fileid))
...     num_sents = len(gutenberg.sents(fileid))
...     num_vocab = len(set(w.lower() for w in gutenberg.words(fileid)))
...     print(round(num_chars/num_words), round(num_words/num_sents), round(num_words/num_vocab), fileid) #here
...
5 25 26 austen-emma.txt

The last number should be a ratio instead between [0, 1].


The error in section 3.2:

>>> from __future__ import division
>>> def lexical_diversity(text):
...     return len(text) / len(set(text)) #here

The function also has the correct expression in the following code block.


I will submit a PR to fix the issue. Thanks for the pleasant read.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions