Skip to content

--filename: Enhancement suggestion: UTF-8 related options #544

@chrfranke

Description

@chrfranke

Suggestion for future enhancement:

Provide new options to control utf8 coverage, for example:
--filename_opts utf8-like: Generate random UTF-8-like strings as currently done.
--filename_opts utf8[-strict]: Only allow strictly valid UTF-8 encodings. These must exclude for example:

$ printf '\xC0\x80' | iconv -f utf-8 -t utf-32 >/dev/null # encodings of the null byte
iconv: (stdin):1:0: cannot convert
$ printf '\xC0\xA1' | iconv -f utf-8 -t utf-32 >/dev/null # unnecessary encodings of plain ASCII (here: '!')
iconv: (stdin):1:0: cannot convert
$ printf '\xE0\x82\xBF' | iconv -f utf-8 -t utf-32 >/dev/null  # unnecessarily long encodings (here: U+00BF='¿')
iconv: (stdin):1:0: cannot convert
$ printf '\xED\xA0\x81' | iconv -f utf-8 -t utf-32 >/dev/null  # forbidden encoding of UTF-16 Surrogates (here: U+D801)
iconv: (stdin):1:0: cannot convert

Straightforward algorithm: create a random number in the range 0x1-0x10FFFF excluding 0xD800-0xDFFF and then encode it into UTF-8.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions