-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Labels
ToDoKanban - Issue to be done in current sprintKanban - Issue to be done in current sprintbugconfirmed
Milestone
Description
calc::entropy_bits() is not calculating the entropy correctly. Fortunately, it works for the current use case, but should somebody else use it to calculate the entropy of a list with repeated elements the result would be totally wrong.
Example:
>>> entropy_bits(list('abcabcabcabc')) # repeated elements, problem
6.339850002884623 # should be 1.5849625007211559
>>> entropy_bits(list('abcdefghijkl')) # no element repetition, ok
3.584962500721156 # correct
The problem is not taking into consideration the number of times an element is repeated in the list. The fix is quite easy:
for prob, count in zip(probs, counts):
entropy -= prob * log2(prob) / count
print(entropy)
Note that len(probs) == len(counts)
and are respectively ordered.
Metadata
Metadata
Assignees
Labels
ToDoKanban - Issue to be done in current sprintKanban - Issue to be done in current sprintbugconfirmed