Skip to content

entropy_bits does not always calculates entropy correctly #16

@HacKanCuBa

Description

@HacKanCuBa

calc::entropy_bits() is not calculating the entropy correctly. Fortunately, it works for the current use case, but should somebody else use it to calculate the entropy of a list with repeated elements the result would be totally wrong.
Example:

>>> entropy_bits(list('abcabcabcabc'))  # repeated elements, problem
6.339850002884623  # should be 1.5849625007211559
>>> entropy_bits(list('abcdefghijkl'))  #  no element repetition, ok
3.584962500721156  # correct

The problem is not taking into consideration the number of times an element is repeated in the list. The fix is quite easy:

for prob, count in zip(probs, counts):
    entropy -= prob * log2(prob) / count
    print(entropy)

Note that len(probs) == len(counts) and are respectively ordered.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ToDoKanban - Issue to be done in current sprintbugconfirmed

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions