Skip to content

Wrong encoding of non-ASCII characters in JSON #40

@jscissr

Description

@jscissr

Non-ASCII characters are encoded incorrectly by rdb-cli dump.rdb json.

Example:
Add a key to redis with SET demo "Müller".
Run rdb-cli dump.rdb json.
The result is:

[{
    "demo":"M\u00c3\u00bcller"
}]

After unescaping, we get "Müller".

For comparison, rdbtools (which does not work with newer redis versions) outputs:

[{
"demo":"M\u00fcller"}]

The simplest way to fix this is to avoid escaping non-ASCII characters entirely, and output them as is:

diff --git a/src/ext/handlersToJson.c b/src/ext/handlersToJson.c
index c5addf7..7a7e299 100644
--- a/src/ext/handlersToJson.c
+++ b/src/ext/handlersToJson.c
@@ -65,7 +65,7 @@ static void outputPlainEscaping(RdbxToJson *ctx, char *p, size_t len) {
             case '\t': fprintf(ctx->outfile, "\\t"); break;
             case '\b': fprintf(ctx->outfile, "\\b"); break;
             default:
-                fprintf(ctx->outfile, (isprint(*p)) ? "%c" : "\\u%04x", (unsigned char)*p);
+                fprintf(ctx->outfile, ((unsigned char)*p > 127 || isprint(*p)) ? "%c" : "\\u%04x", (unsigned char)*p);
         }
         p++;
     }

With this change, the result is:

[{
    "demo":"Müller"
}]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions