Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simplejson incorrectly returns byte strings in many cases #369

Open
hodgestar opened this issue Sep 9, 2014 · 9 comments
Open

simplejson incorrectly returns byte strings in many cases #369

hodgestar opened this issue Sep 9, 2014 · 9 comments

Comments

@hodgestar
Copy link

The JSON specification defines JSON strings as:

A string is a sequence of Unicode code points wrapped with quotation marks (U+0022).

The natural Python analog of these appears to be Python unicode strings. Confusingly, the simplejson library sometimes returns unicode strings and sometimes byte strings, e.g.:

>>> simplejson.loads('"\\u00e6"')
u'\xe6'
>>> simplejson.loads('"ae"')
'ae'
>>> simplejson.loads(u'"ae"')
u'ae'

This makes writing correct code on top of simplejson rather hard.

Riak uses simplejson in two kinds of places -- firstly, for decoding object data in the client. These allow for overriding the default encoders and so are less of an issue. Secondly, in the HTTP transport for decoding JSON responses. These provide no means to control the JSON parser used and return decoded keys and indexes in many places.

For the HTTP transport could we either:

  • An an option for controlling which parser is used by the HTTP client.
  • Use the stdlib json library which returns consistently typed results.
@seancribbs
Copy link

You are not required to include simplejson in your project. It is not a dependency of the client, but will use it if present.

$ ack simplejson -C 5 riak
riak/client/__init__.py
18-specific language governing permissions and limitations
19-under the License.
20-"""
21-
22-try:
23:    import simplejson as json
24-except ImportError:
25-    import json
26-
27-import random
28-from weakref import WeakValueDictionary

riak/tests/test_kv.py
5-import platform
6-from time import sleep
7-from riak import ConflictError, RiakBucket, RiakError
8-from riak.resolver import default_resolver, last_written_resolver
9-try:
10:    import simplejson as json
11-except ImportError:
12-    import json
13-
14-if platform.python_version() < '2.7':
15-    unittest = __import__('unittest2')

riak/transports/http/transport.py
18-specific language governing permissions and limitations
19-under the License.
20-"""
21-
22-try:
23:    import simplejson as json
24-except ImportError:
25-    import json
26-
27-
28-import httplib

I would love to hear @etrepum reply to your claims.

@hodgestar
Copy link
Author

We don't require simplejson either, but some of our dependencies do.

@etrepum
Copy link

etrepum commented Sep 9, 2014

Forcing the return of unicode objects to simplejson can be done by providing unicode input. It will only return str objects for ASCII text, which is generally interchangeable with unicode objects (they even hash the same) except when explicit type checks are done. This is a (significant) speed and space optimization, since Python 2 doesn't have the same sort of flexible unicode representation that Python 3 does.

@hodgestar
Copy link
Author

@etrepum Always passing in unicode is certainly an option, but that would have to be done by the riak HTTP transport implementation since that is what calls simplejson.loads.

@hodgestar
Copy link
Author

P.S. I'm happy to create a pull request for either of my suggestions. I'd just like an indication that it has some chance of landing before I start.

P.P.S. We have to work around this anyway because we have production code that is breaking because of this right now. We've uninstalled simplejson but it'll easily get installed again accidentally via dependencies.

@etrepum
Copy link

etrepum commented Sep 9, 2014

It's probably best to consolidate all of the JSON that the riak client does and parameterize it such that you can provide some object with loads and dumps methods to the constructor and it'll just use that everywhere. This way you can parameterize the behavior to be exactly what suits your use case, whether that's specifying different options than the defaults or swapping out the implementation entirely.

@seancribbs
Copy link

I agree, @etrepum, that seems the best course of action. @hodgestar If you have time for a pull-request, we'd appreciate it, otherwise it seems reasonable enough that we'll include it in the next development cycle.

@hodgestar
Copy link
Author

Awesome! Thanks. Will try get a PR cut later in the week.

@jerith
Copy link
Contributor

jerith commented Sep 11, 2014

A fix for this should probably also include the default_encoder stuff in riak.client.__init__.

@lukebakken lukebakken modified the milestone: riak-python-client-2.7.1 Dec 16, 2016
@lukebakken lukebakken modified the milestones: riak-python-client-2.7.1, riak-python-client-3.0.0 Feb 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants