-
Notifications
You must be signed in to change notification settings - Fork 60
fix: bytes encoding error "string argument without an encoding" #976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v1.2.6
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The effects of the previous change were not evident until some folks were running in their environment meaning no unit tests exist that trigger this. Please submit a unit test with these changes that prove they work as expected.
Do we have any example tests that trigger escrows I could start from? All the prior change did was change the way we logged things, meaning there were no unit tests prior to my prior change either. |
src/keri/core/eventing.py
Outdated
@@ -5956,7 +5956,7 @@ def processEscrowPartialDels(self): | |||
sigs = self.db.getSigs(dgkey) # list of sigs | |||
if not sigs: # empty list | |||
# no sigs so raise ValidationError which unescrows below | |||
msg = f"PDE Missing escrowed evt sigs at dig = {bytes(edig)}" | |||
msg = f"PDE Missing escrowed evt sigs at dig = {edig}" | |||
logger.info("Kevery unescrow error: %s", bytes(edig)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You either want msg
or edig
here but bytes(edig)
is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the bytes(edig)
is a holdover from the old logging statements. For example, from KERIpy 1.1.25 the logging statements Sam wrote look like this:
logger.info("Kevery unescrow error: Missing event sigs at."
"dig = %s\n", bytes(edig))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f-strings are capable to convert the edig
to the __repr__
format for the given datatype, whatever it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line 5960 is not an f-string. if the variable edig
is a string, which is one of the possible types from self.db.pdes.getOnItemIter
you will get TypeError: string argument without an encoding
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason line 5960 does not cause an error is because the log substitution with the %s
operator auto-converts to the __repr__
of whatever data type is there, thus a string encoding is not needed. All the old logging statements used %s
and log substitutions, not f-string interpolations.
The problem I am trying to fix is when the call to bytes(edig)
fails due to the lack of an encoding. It just so happens that this is occurring inside of an f-string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I am wrong, you are right. The bytes(edig)
call would cause an error if it were a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My latest commit addresses the underlying issue by me going and doing the work I should have done the first time around, reading all of the Baser or LMDBer functions to see their specific return types and then ensuring the type used in the log statement is correct.
Its not a unit test: But I believe the following will exercise the variations that might cause failures. import logging
def main():
logger = logging.getLogger("new")
logger.setLevel(logging.INFO)
console_handler = logging.StreamHandler()
console_handler.setFormatter(logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s'))
logger.addHandler(console_handler)
try:
logger.info("Kevery unescrow error: %s", bytes("edig"))
except TypeError as e:
print("existing error", e)
# because some generators return str | bytes | memoryview | Iterable
transformations = [
lambda s: s, # string
lambda s: s.encode("utf-8"), # bytes
lambda s: memoryview(s.encode("utf-8")) # memoryview
]
edigs = []
for transform in transformations:
edig = transform("edig")
print(f"Type: {type(edig).__name__}")
edigs.append(edig)
logger.info("msg: %s", f"PDE Stale event escrow at dig = {edig}")
logger.info("msg: %s", f"PDE Stale event escrow at dig = {edigs}")
if __name__ == "__main__":
main() |
The LMDB function return Python memoryview objects, which are not usable in f"" strings are in general with most operations like logging. If it is a memoryview than converting it to bytes is accomplished by bytes(mv). But formated f"" want unicode strings not bytes so to to convert to regular strings you want. bytes(mv).decode() some of the functions do the work of converting to str by checking the type and then decoding if it needed it. But when trying to somehow optimize the logging or simplify the logic, taking shortcuts becomes counter productive when one does not actually understand the types one is encountering. So the logging worked before. Now its broken. The main problem was it was too verbose but it was correct. So instead of just fixing the verbosity, there was an attempt to somehow make it more 'efficient' without doing the work of understanding it. Better to just do the one thing and not the other. |
Sam, thank you for clarifying that the return value for edig in statements like for ekey, edig in self.db.getOoeItemIter(key=key): can sometimes be a memoryview. When I dug into the class LMDBer(filing.Filer):
...
def getTopItemIter(self, db, top=b''):
...
with self.env.begin(db=db, write=False, buffers=True) as txn:
cursor = txn.cursor()
if cursor.set_range(top): # move to val at key >= key if any
for ckey, cval in cursor.iternext(): # get key, val at cursor
ckey = bytes(ckey) # <----------- referring to this line ---------------
if not ckey.startswith(top): # prev entry if any last in branch
break # done
yield (ckey, cval) # another entry in branch startswith key
return # done raises StopIteration I made the assumption that My goal wasn't to make it more efficient, I just wanted to get past the "string argument without encoding" TypeError which was confusing. It confused me because I thought that From what I can tell the problem I am trying to solve occurs when the |
After reading the code just now I found out the following, which will be reflected in the PRs shortly, regarding the value for
I will update these escrow functions accordingly. This is the kind of deeper code analysis I should have done before making a bulk change like I did when I wrote this PR. In the future I will be more diligent in validating assumptions I make about data types in the code. |
No, I was wrong again, I swapped the key and value in my mind. I'll correct the above comment. Some of the |
9133d0b
to
46d478f
Compare
The latest commit fixes the type used in log messages by properly converting memory view types to |
@pfeairheller I added unit tests proving that the log statements work in the OOO, PSE, PWE, and PDEL escrows. I will have the rest tested later this week. I will add the same tests to 1.1.32 and main as well. |
14301a7
to
fb8715c
Compare
Resolves #973 |
768dc13
to
4452209
Compare
Remove Make sure strings passed to |
Edited:
When I made some changes to the logging I incorrectly assumed that
edig
was always a memoryview in theprocessEscrow*
functions and thus able to be converted to bytes withbytes(edig)
. This was incorrect because sometimes theedig
argument is a string, so runningbytes(edig)
without specifying an encoding threw an error, thus creating the "string argument without an encoding" error.This PR fixes the logging in each
processEscrow*
function so that each type, memoryview, bytes, or string, is appropriately converted to a printable value before being interpolated in a log message.