-
Notifications
You must be signed in to change notification settings - Fork 27
Description
Is your feature request related to a problem? Please describe.
In Python 2, it's non-obvious how to handle strings correctly, such that plugin code doesn't have problems when encountering non-ASCII text. Code like this can cause problems:
result = libs.run_bash(connection, "cat settings.txt | grep FIRST_NAME | awk '{print $2'}")
first_name = result.stdout
message = "The first name is {}".format(first_name)
The problem happens when the remote-side output contains non-ASCII characters. When we create message, we are calling format on an str object, not a unicode object. That means that Python needs to convert those non-ASCII characters into bytes. But, we've never specified which encoding to use to do so.
Describe the solution you'd like
A general best practice is the so-called "Unicode Sandwich". This says to always use unicode objects, not str objects.
The only exception is when directly interacting with other code that really expects/produces sequences of bytes (not characters). Even so, you should immediately decode any received bytes before the rest of your code sees them, and you should encode characters to bytes at the last possible second before sending them out. For plugins, any strings passed to/from Delphix code already support unicode objects, so this exception does not apply to plugins
So, we want to encourage plugin authors to:
- Always use
Unicodeobjects (u"Hello, World", not"Hello, World") - Never call
encodeordecode.
We should:
- Document this as a best practice. This includes giving examples of problematic code, as above
- Change our documentation examples so that they actually follow this best practice.
- Change
dvp initso that the code it generates also follows this best practice.