Use of [:cntrl:] character class in tokenize()

The POSIX [:cntrl:] character class does not exactly cover the same chars which must be escaped according to https://tools.ietf.org/html/rfc7159#section-7 (i.e. U+0000 through U+001F). [:cntrl:] does also cover U+007F, and all C1 control chars when used in a UTF locale. See the below example where I am getting an error in my locale "en_US.UTF-8". Apart from using LC_ALL=C, the error can be avoided when changing [:cntrl:] to the range defined in the spec: \x00-\x1F.

`$ echo world_bank109.json | awk -f JSON.awk > /dev/null`
`world_bank109.json: expected <value> but got <"> at input token 263`
`, "productlinetype" : "L" , "project_abstract" : { "cdata" : <<">> T h e o b j e c t i`
`$ echo world_bank109.json | LC_ALL=C awk -f JSON.awk > /dev/null`
(no error message here)

[world_bank109.json.txt](https://github.com/step-/JSON.awk/files/534291/world_bank109.json.txt), which is line 109 from the world bank sample file at [http://jsonstudio.com/resources/](http://jsonstudio.com/resources/)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use of [:cntrl:] character class in tokenize() #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Use of [:cntrl:] character class in tokenize() #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions