Fix issue #30 Unicode decoding in conversion to CharSequence #32

gsnewmark · 2017-06-11T12:12:13Z

Issue #30 affects us too, so I've looked a bit into it. CharseDecoder's JavaDoc is somewhat vague, but it states that:

In any case, if this method [decode] is to be reinvoked in the same decoding operation then care should be taken to preserve any bytes remaining in the input buffer so that they are available to the next invocation.

It looks like in case of underflow during the decode operation CharsetDecoder leaves bytes not constituting a full character in the passed input and expects next decode operation to pass these bytes along with additional ones which together form a full character. So I've added merging of the remaining extra-bytes and new in to the undeflow branch of the decoding. It fixes the issue, but I'm not that experienced with byte fiddling, so maybe there is a more effective way to do that.

In case compatibility with Clojure 1.5 is needed, I can remove usage of some-> (the same goes for some? and Clojure < 1.5).

Test could be found in pull request #31.

ztellman · 2017-06-11T21:41:49Z

Thank you, I've been traveling and hadn't been able to look at this. I'll merge this, and make any performance tweaks myself.

gsnewmark · 2017-06-11T21:46:34Z

Thanks!

Fix Unicode decoding in conversion to CharSequence

19606d3

ztellman merged commit 29f50f7 into clj-commons:master Jun 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix issue #30 Unicode decoding in conversion to CharSequence #32

Fix issue #30 Unicode decoding in conversion to CharSequence #32

Uh oh!

gsnewmark commented Jun 11, 2017

Uh oh!

ztellman commented Jun 11, 2017

Uh oh!

gsnewmark commented Jun 11, 2017

Uh oh!

Uh oh!

Fix issue #30 Unicode decoding in conversion to CharSequence #32

Fix issue #30 Unicode decoding in conversion to CharSequence #32

Uh oh!

Conversation

gsnewmark commented Jun 11, 2017

Uh oh!

ztellman commented Jun 11, 2017

Uh oh!

gsnewmark commented Jun 11, 2017

Uh oh!

Uh oh!