Skip to content

colorhash: overflow encountered in scalar multiply #216

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 17, 2025

Conversation

saim61
Copy link

@saim61 saim61 commented Dec 4, 2024

I noticed a small problem in the color hashing algo for HSV. When the bin bits increase to size 46 and onwards it gives a warning of scalar multiplication and due to this we get a slightly wrong answer.

The warning:

/imagehash/__init__.py:441: RuntimeWarning: overflow encountered in scalar multiply
  values.append(min(maxvalue - 1, int(counts * maxvalue * 1. / c)))

Peppers image bin bit size 46 (with fix)


0ffffffffff07fffffffffe0ffffffffff40fffffffffd000ffffffffc07ffffffffc0001fffffff83fffffffffeffffffffffec01ffffffffd00000000000000000000000000000000000fffffffffff


Peppers image bin bit size 46 (without fix)

0ffffffffff07fffffffffe0ffffffffff40fffffffffd000ffffffffc07ffffffffc0001fffffff83fffffffffefffffffffff001ffffffffd00000000000000000000000000000000000fffffffffff

Key Difference:
At Position 108–119 (starting from index 0):
With Fix: feffffffffffec01
Without Fix: fffffffffff001

The fix: we just needed to round off the counts variable before performing other calculations.

@coveralls
Copy link

coveralls commented Dec 4, 2024

Coverage Status

coverage: 87.973%. remained the same
when pulling f62f120 on saim61:colorhash_fix
into 07951cd on JohannesBuchner:master.

@JohannesBuchner
Copy link
Owner

Will this break backwards compatibility, for example, for someone who created a database of colorhashes?

@saim61
Copy link
Author

saim61 commented Dec 4, 2024

If they're using bin bits size greater than 46, then yes. It will affect the outputs. This doesn't change anything for people using bin bits 3 to 45. However, the change in this PR does give you the correct output which you should be getting.

@JohannesBuchner
Copy link
Owner

Just a small question: for what application would one use so many bits? If you hash, you want it to be lossy so that similar images have the same value.

@saim61
Copy link
Author

saim61 commented Dec 5, 2024

It does seem unlikely that anyone would use such a large bin bits size. I was playing around with this library and was working on something that does exactly what this library does in Go. So while testing and comparing values for hashes i noticed this bug in the python repo.

@JohannesBuchner
Copy link
Owner

JohannesBuchner commented Dec 5, 2024

I think the idea of the line is that a value should fit between 0 and 2^bitsize, from another range. The line scales linearly and then discretizes to integer. The change in the PR applies the discretization to counts, which is already integer. With this PR, I think values can end up containing floats.

@saim61
Copy link
Author

saim61 commented Dec 5, 2024

Actually i have done quite some research on this. If we dont make this change and print out the values array, you can see negative values in some of the indexes which cause incorrect answers. This is due to integer overflow.

@JohannesBuchner
Copy link
Owner

OK, but for that case,
min(maxvalue - 1, int(counts * maxvalue * 1. / c))
should be changed to
int(min(maxvalue - 1, counts * maxvalue * 1. / c))

@saim61
Copy link
Author

saim61 commented Dec 5, 2024

Screenshot 2024-12-05 at 11 52 10 PM

I have tried this too when I was debugging things. It doesnt work.

This issue and the fix both itself are very strange 😄

@saim61
Copy link
Author

saim61 commented Apr 12, 2025

@JohannesBuchner so, what do you think? do you have any other questions I can answer?
you can take this comment as a gentle reminder (in case you forgot about my PR :D)

@JohannesBuchner
Copy link
Owner

I suppose you can divide by c first, and then multiply by maxvalue, to avoid the overflow.

@saim61
Copy link
Author

saim61 commented Apr 17, 2025

@JohannesBuchner it worked :) made the change

@JohannesBuchner JohannesBuchner changed the title simple fix for colorhash algo colorhash: overflow encountered in scalar multiply Apr 17, 2025
@JohannesBuchner JohannesBuchner merged commit 4e289eb into JohannesBuchner:master Apr 17, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants