-
-
Notifications
You must be signed in to change notification settings - Fork 320
Open
Description
Suggestion for future enhancement:
Provide new options to control utf8 coverage, for example:
--filename_opts utf8-like
: Generate random UTF-8-like strings as currently done.
--filename_opts utf8[-strict]
: Only allow strictly valid UTF-8 encodings. These must exclude for example:
$ printf '\xC0\x80' | iconv -f utf-8 -t utf-32 >/dev/null # encodings of the null byte
iconv: (stdin):1:0: cannot convert
$ printf '\xC0\xA1' | iconv -f utf-8 -t utf-32 >/dev/null # unnecessary encodings of plain ASCII (here: '!')
iconv: (stdin):1:0: cannot convert
$ printf '\xE0\x82\xBF' | iconv -f utf-8 -t utf-32 >/dev/null # unnecessarily long encodings (here: U+00BF='¿')
iconv: (stdin):1:0: cannot convert
$ printf '\xED\xA0\x81' | iconv -f utf-8 -t utf-32 >/dev/null # forbidden encoding of UTF-16 Surrogates (here: U+D801)
iconv: (stdin):1:0: cannot convert
Straightforward algorithm: create a random number in the range 0x1-0x10FFFF excluding 0xD800-0xDFFF and then encode it into UTF-8.
Metadata
Metadata
Assignees
Labels
No labels