Please include the actual ranges mapped to, not just the bit-operator-code to get there

Currently I find the spec easy to read for implementing it, but very bad to read for understanding what the actual effect is in code point space. Particularly, I have trouble figuring out the actual ranges in the code point space the invalid surrogates are mapped to, and I find this suboptimal given Unicode already has a lot of confusing ranges.

The problem IMHO is in particular this section, or the lack of concrete ranges given afterwards:

>
> 4. Potentially ill-formed UTF-16
>
> A sequence of 16-bit code units is potentially ill-formed UTF-16 if it is intended to be interpreted as UTF-16, but is not necessarily well-formed in UTF-16. It effectively encodes a sequence of code points that do not contain any surrogate code point pair.
>
> Note: Like UTF-16, potentially ill-formed UTF-16 can not represent a surrogate code point pair since the corresponding surrogate 16-bit code unit pair would instead represent a supplementary code point. Unlike well-formed UTF-16, it might contain isolated surrogate code points.
>
> Any sequence of 16-bit code units has an interpretation as potentially ill-formed UTF-16.
>
> WTF-16 is sometimes used as a shorter name for potentially ill-formed UTF-16, especially in the context of systems were originally designed for UCS-2 and later upgraded to UTF-16 but never enforced well-formedness, either by neglect or because of backward-compatibility constraints.

Don't get me wrong, this formal definition is nice, but this is only followed up by actual encoding steps that are on the other extreme end and *way too practical*, filled with bit transform ops that don't make it obvious what ranges are actually used.

What I would have expected in the `4. Potentially ill-formed UTF-16` section is something like this addition (**possibly incorrect,** this is my best guess & what I would have liked to have properly spelled out):

~~previous stuff removed~~ -> see next comment for revised, better suggestion

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Please include the actual ranges mapped to, not just the bit-operator-code to get there #12

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Please include the actual ranges mapped to, not just the bit-operator-code to get there #12

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions