Skip to content

Conversation

@adepasquale
Copy link

Found this bug while using oleobj.py on a PowerPoint file:

$ oleobj file.ppt
oleobj 0.56.1 - http://decalage.info/oletools
[redacted]
WARNING  Wanted to read 4096, got 2542

The extracted embedded file was not matching the hash of the real embedded file, so I traced back the code starting from the warning message here:

oletools/oletools/oleobj.py

Lines 645 to 646 in a7d1050

log.warning('Wanted to read {0}, got {1}'
.format(next_size, len(data)))

The problem is that olefile.py is expecting read() to return all bytes (except for the last sector):
https://github.com/decalage2/olefile/blob/cc0bdc07194fb7dc21e75a95c9e771e5240952b2/olefile/olefile.py#L666-L676

ppt_record_parser.IterStream is derived from io.RawIOBase which is unfortunately not guaranteed to return the desired bytes during read().

Since IterStream implementation was already buffered, I simply changed readinto() to always return the desired length whenever possible; you might want to change that to io.BufferedIOBase

IterStream is derived from io.RawIOBase which is not guaranteed to
return the desired bytes during read(). Unfortunately, olefile.py is
expecting read() to return all bytes (except for the last sector):

https://github.com/decalage2/olefile/blob/cc0bdc07194fb7dc21e75a95c9e771e5240952b2/olefile/olefile.py#L666-L676

Since IterStream implementation was already buffered, I changed
readinto() to always return the desired length whenever possible.
@decalage2 decalage2 self-requested a review September 15, 2021 18:24
@decalage2 decalage2 self-assigned this Sep 15, 2021
@decalage2 decalage2 added this to the oletools 0.60 milestone Sep 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants