Skip to content

fix: bytes encoding error "string argument without an encoding" #976

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: v1.2.6
Choose a base branch
from

Conversation

kentbull
Copy link
Contributor

@kentbull kentbull commented Apr 23, 2025

Edited:
When I made some changes to the logging I incorrectly assumed that edig was always a memoryview in the processEscrow* functions and thus able to be converted to bytes with bytes(edig). This was incorrect because sometimes the edig argument is a string, so running bytes(edig) without specifying an encoding threw an error, thus creating the "string argument without an encoding" error.

This PR fixes the logging in each processEscrow* function so that each type, memoryview, bytes, or string, is appropriately converted to a printable value before being interpolated in a log message.

Copy link
Member

@pfeairheller pfeairheller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The effects of the previous change were not evident until some folks were running in their environment meaning no unit tests exist that trigger this. Please submit a unit test with these changes that prove they work as expected.

@kentbull kentbull marked this pull request as draft April 24, 2025 14:44
@kentbull
Copy link
Contributor Author

Do we have any example tests that trigger escrows I could start from? All the prior change did was change the way we logged things, meaning there were no unit tests prior to my prior change either.

@@ -5956,7 +5956,7 @@ def processEscrowPartialDels(self):
sigs = self.db.getSigs(dgkey) # list of sigs
if not sigs: # empty list
# no sigs so raise ValidationError which unescrows below
msg = f"PDE Missing escrowed evt sigs at dig = {bytes(edig)}"
msg = f"PDE Missing escrowed evt sigs at dig = {edig}"
logger.info("Kevery unescrow error: %s", bytes(edig))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You either want msg or edig here but bytes(edig) is wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the bytes(edig) is a holdover from the old logging statements. For example, from KERIpy 1.1.25 the logging statements Sam wrote look like this:

logger.info("Kevery unescrow error: Missing event sigs at."
                                    "dig = %s\n", bytes(edig))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f-strings are capable to convert the edig to the __repr__ format for the given datatype, whatever it is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 5960 is not an f-string. if the variable edig is a string, which is one of the possible types from self.db.pdes.getOnItemIter you will get TypeError: string argument without an encoding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason line 5960 does not cause an error is because the log substitution with the %s operator auto-converts to the __repr__ of whatever data type is there, thus a string encoding is not needed. All the old logging statements used %s and log substitutions, not f-string interpolations.

The problem I am trying to fix is when the call to bytes(edig) fails due to the lack of an encoding. It just so happens that this is occurring inside of an f-string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I am wrong, you are right. The bytes(edig) call would cause an error if it were a string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My latest commit addresses the underlying issue by me going and doing the work I should have done the first time around, reading all of the Baser or LMDBer functions to see their specific return types and then ensuring the type used in the log statement is correct.

@m00sey
Copy link
Member

m00sey commented Apr 24, 2025

Its not a unit test:

But I believe the following will exercise the variations that might cause failures.

import logging

def main():
    logger = logging.getLogger("new")
    logger.setLevel(logging.INFO)
    console_handler = logging.StreamHandler()
    console_handler.setFormatter(logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s'))
    logger.addHandler(console_handler)

    try:
        logger.info("Kevery unescrow error: %s", bytes("edig"))
    except TypeError as e:
        print("existing error", e)

    # because some generators return str | bytes | memoryview | Iterable
    transformations = [
        lambda s: s,                     # string
        lambda s: s.encode("utf-8"),     # bytes
        lambda s: memoryview(s.encode("utf-8"))  # memoryview
    ]

    edigs = []
    for transform in transformations:
        edig = transform("edig")
        print(f"Type: {type(edig).__name__}")
        edigs.append(edig)
        logger.info("msg: %s", f"PDE Stale event escrow at dig = {edig}")

    logger.info("msg: %s", f"PDE Stale event escrow at dig = {edigs}")

if __name__ == "__main__":
    main()

@SmithSamuelM
Copy link
Collaborator

The LMDB function return Python memoryview objects, which are not usable in f"" strings are in general with most operations like logging. If it is a memoryview than converting it to bytes is accomplished by bytes(mv). But formated f"" want unicode strings not bytes so to to convert to regular strings you want. bytes(mv).decode()

some of the functions do the work of converting to str by checking the type and then decoding if it needed it.

But when trying to somehow optimize the logging or simplify the logic, taking shortcuts becomes counter productive when one does not actually understand the types one is encountering. So the logging worked before. Now its broken. The main problem was it was too verbose but it was correct. So instead of just fixing the verbosity, there was an attempt to somehow make it more 'efficient' without doing the work of understanding it. Better to just do the one thing and not the other.

@kentbull
Copy link
Contributor Author

Sam, thank you for clarifying that the return value for edig in statements like

for ekey, edig in self.db.getOoeItemIter(key=key):

can sometimes be a memoryview. When I dug into the LMDBer.gettopItemIter function to discover the type I saw the ckey = bytes(ckey) in the below code and assumed all returns of edig would be cast to bytes like the one in the below function is.

class LMDBer(filing.Filer):
        ...
        def getTopItemIter(self, db, top=b''):
        ...
        with self.env.begin(db=db, write=False, buffers=True) as txn:
            cursor = txn.cursor()
            if cursor.set_range(top):  # move to val at key >= key if any
                for ckey, cval in cursor.iternext():  # get key, val at cursor
                    ckey = bytes(ckey)        # <----------- referring to this line ---------------
                    if not ckey.startswith(top): #  prev entry if any last in branch
                        break  # done
                    yield (ckey, cval)  # another entry in branch startswith key
            return  # done raises StopIteration

I made the assumption that bytes(edig) in the logger statement was extraneous since it appeared to always be done in the LMDBer function. It sounds like you are saying that in some instances the edig return value is a memoryview. I will go and manually check each iterator function to confirm this just to double check.

My goal wasn't to make it more efficient, I just wanted to get past the "string argument without encoding" TypeError which was confusing. It confused me because I thought that edig would always be a bytes object due to the cast I saw in LMDBer. I didn't take the time at first to look at every LMDBer function that is used in eventing.py in these escrow loops, though now I will.

From what I can tell the problem I am trying to solve occurs when the bytes(edig) function is called with something Python thinks is a string.

@kentbull
Copy link
Contributor Author

kentbull commented Apr 25, 2025

After reading the code just now I found out the following, which will be reflected in the PRs shortly, regarding the value for edig in the various escrow functions.

  • processEscrowOutOfOrders -> memoryview
  • processEscrowPartialSigs -> memoryview
  • processEscrowPartialWigs -> memoryview
  • processEscrowPartialDels -> string
  • processEscrowDelegables -> bytes, yet comes from DB as deserialized string, then cast to bytes
  • processQueryNotFound -> string
  • processEscrowDuplicitous -> memoryview

I will update these escrow functions accordingly. This is the kind of deeper code analysis I should have done before making a bulk change like I did when I wrote this PR. In the future I will be more diligent in validating assumptions I make about data types in the code.

@kentbull
Copy link
Contributor Author

No, I was wrong again, I swapped the key and value in my mind. I'll correct the above comment. Some of the edig values are memoryview values, not bytes.

@kentbull kentbull force-pushed the escrow-logging-error-fix branch from 9133d0b to 46d478f Compare April 25, 2025 01:02
@kentbull
Copy link
Contributor Author

The latest commit fixes the type used in log messages by properly converting memory view types to bytes and then using .decode() to transform it into a string. For consistency I used a dig variable to get it right once and then reuse it in all relevant log statements in each processEscrow* function.

@kentbull kentbull marked this pull request as ready for review April 26, 2025 16:43
@kentbull
Copy link
Contributor Author

@pfeairheller I added unit tests proving that the log statements work in the OOO, PSE, PWE, and PDEL escrows. I will have the rest tested later this week. I will add the same tests to 1.1.32 and main as well.

@kentbull kentbull force-pushed the escrow-logging-error-fix branch from 14301a7 to fb8715c Compare April 29, 2025 23:43
@kentbull
Copy link
Contributor Author

Resolves #973

@kentbull kentbull force-pushed the escrow-logging-error-fix branch from 768dc13 to 4452209 Compare April 30, 2025 14:57
@kentbull
Copy link
Contributor Author

Remove dig variables and replace with inline bytes(edig).decode() calls.

Make sure strings passed to bytes() cast have encodings specified.

@kentbull kentbull marked this pull request as draft May 27, 2025 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants