Commit 29a0765
authored
Fix PIL image hashing to use actual bytes instead of object repr (#3331)
The convert_pil_to_hash function was hashing str(BytesIO) which includes
the memory address, causing different hashes for the same image across
runs. This made task hashes non-deterministic for tasks with images.
Reproducer:
```python
import hashlib
from io import BytesIO
from PIL import Image
img = Image.new('RGB', (2, 2), color='red')
img_bytes = BytesIO()
img.save(img_bytes, format="PNG")
# Buggy: hashes "<_io.BytesIO object at 0x...>"
print(str(img_bytes)) # <_io.BytesIO object at 0x1023d8bd0>
buggy_hash = hashlib.sha256(str(img_bytes).encode()).hexdigest()
# Fixed: hashes actual PNG bytes
print(img_bytes.getvalue()[:30]) # b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\x02...'
fixed_hash = hashlib.sha256(img_bytes.getvalue()).hexdigest()
```
Running the same image twice:
- Buggy approach: different hashes each time due to memory address
- Fixed approach: consistent hash de33ddc09a0ba9b8...
This fix ensures deterministic task hashes for evaluations with images.1 parent 7ddd966 commit 29a0765
1 file changed
+1
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
576 | 576 | | |
577 | 577 | | |
578 | 578 | | |
579 | | - | |
| 579 | + | |
580 | 580 | | |
581 | 581 | | |
582 | 582 | | |
| |||
0 commit comments