-
Notifications
You must be signed in to change notification settings - Fork 1
Description
I don't read tex natively and it's super inconvenient to download and read the markdown outside of just allowing github to render it here. The PDF format doesn't respect my dark mode preference either!
I'm going to cheekily paste in the current version of the markdown here so that I can read the spec in the meantime!
author:
- Christian Parpart
date: '2021-09-04 (draft, revision 1)'
title: |
Unicode in Terminals
a proposal to standardizing basic Unicode features
History and current state
Historically, only 7-bit characters with C0 control codes were supported
by terminals and different languages by selecting their respective code
pages.
Later on this was extended to 8-bit ASCII and along with C1 control
codes.
With the introduction of Unicode there were no need to have codepages
anymore, but the Unicode spec was not explicitly designed to also cover
terminals, except that C0 and C1 codepoints were preserved.
With Unicode UTF-8 it was possible to at least pass Unicode characters
to the terminal, but rendering of a few characters as well as their
respective cursor placement is not defined in the Unicode standard.
Also, Unicode introduced codepoint sequences that are mapping to a
single user perceived character - so called grapheme clusters. The
terminal has never attempted any formalization on how to deal with
grapheme clusters, variation selectors, their east asian width, nor
emoji and emoji presentation handling.
This spec tries to address some of the problems terminals are suffering
with Unicode today.
Backwards Compatibility
basic points are: Everything is disabled by default, so legacy apps
don't break more than they used to break already.
Backwards compatibility is retained by leaving everything as undefined
as it is without this specification.
The application can test for the availability of this feature and has to
explicitly enable it in order to get the set of properties as defined in
this document guaranteed.
Future Compatibility and Stability
Unicode itself had a major breakage at version between version 8 and 9
with regards to some codepoints having their east asian width changed.
While this may happen any time again, we do not expect that to happen
that soon nor that frequent to address future incompatibilities as of
this spec and leave this for a later point.
Feature and Mode State Detection
[CSI ? 2027 $ p
]{style="background-color: light-gray"}([ref:DECRQM]{reference-type="ref"
reference="ref:DECRQM"}) can be used for testing the availability of
this feature as well as the current mode the terminal is in with regards
to this specification, the
[CSI ? 2027 $ p
]{style="background-color: light-gray"}reply will
indigate each state acurately enough not not need any new VT sequence
introduced.
Mode Switching
-
[
CSI ? 2027 h
]{style="background-color: light-gray"}
([ref:DECSM]{reference-type="ref"
reference="ref:DECSM"}) for ensuring conformance to all rules as
defined by this specification -
[
CSI ? 2027 l
]{style="background-color: light-gray"}
([ref:DECRM]{reference-type="ref"
reference="ref:DECRM"}) for undefined behavior
Semantics
The following set of semantics MUST be adhered to if this VT mode
[2027
]{style="background-color: light-gray"} is enabled. If the VT
mode [2027
]{style="background-color: light-gray"} is not set, then the
behavior is as undefined as if this specification was not implemented at
all in order to retain behavior of current terminals and their legacy
applications.
Grapheme Cluster
{#section .unnumbered}
With this mode enabled, the terminal MUST support grapheme clusters
in conformance to algorithm as described in UTS 29
[ref:UTS-29]{reference-type="ref"
reference="ref:UTS-29"}.
{#section-1 .unnumbered}
This implies that every consecutively written character on the terminal
stream that is non-breakable as per UTS 29
[ref:UTS-29]{reference-type="ref"
reference="ref:UTS-29"} will always end up in the same terminal's grid
cell.
{#section-2 .unnumbered}
Therefore, extending a grapheme cluster with consecutively added
codepoints will not move the cursor except for variation selector 16
(VS16) that may have caused the width of the grapheme cluster to change
to wide (2 grid cells).
{#section-3 .unnumbered}
When the cursor moves to a grid cell that contains a complete or
incomplete grapheme cluster, this grid cell's contents will be erased
and overwritten rather then textually concatinated.
{#section-4 .unnumbered}
Therefore cursor movement semantics of the terminal remain unchanged.
Emoji
{#section-5 .unnumbered}
Emoji symbols are always rendered in square aspect ratio (as proposed by
UTS 51 [ref:UTS-51]{reference-type="ref"
reference="ref:UTS-51"}), implying a East Asian Width of Wide, 2 grid
cells.
{#section-6 .unnumbered}
ZWJ emoji are required to be displayed as a single image with a width of
2 grid cells.
{#section-7 .unnumbered}
The alternate display of ZWJ emoji in a decomposed sequence of
sub-images must not be used as a fallback as it will break cursor
movemeent guarantees.
{#section-8 .unnumbered}
If a ZWJ emoji cannot be rendered the display behavior is undefined -
for example, a unicode replacement character
[U+FFFD
]{style="background-color: light-gray"} could be displayed
instead.
{#section-9 .unnumbered}
In emoji emoji presentation, the cursor will always move by 2 grid
cells.
{#section-10 .unnumbered}
SGR attributes applied to a grid cell containing an emoji symbol are not
strictly defined and it is left to the terminal emulator to have
sensible meaningful semantics with regards to emoji symbols.
Variation Selector 16
VS16 promotes the grapheme cluster to emoji emoji presentation, implying
that this will force the grapheme cluster's width to be 2, which may
possibly cause reflowing of that symbol to the next line if on right
margin with AutoWrap mode is set.
Variation Selector 15
{#section-11 .unnumbered}
VS15 forces the grapheme cluster to emoji text presentation. This will
NOT change the underlying width but only change the display to
prefer textual non-colored presentation.
{#section-12 .unnumbered}
This matches the behavior of todays web browsers and should thus feel
most intuitive to users.
{#section-13 .unnumbered}
The cursor will move by columns if the symbol has the default
presentation of emoji.
Margins and AutoWrap with Emoji
Emoji written at the right margin with AutoWrap mode disabled may or may
not be rendered in half or not be displayed at all. This behavior is
undefined to ease implementation and adoption of this specification.
References
-
[[ref:DECRQM]]{#ref:DECRQM label="ref:DECRQM"}DECRQM,
https://vt100.net/docs/vt510-rm/DECRQM.html -
[[ref:DECSM]]{#ref:DECSM label="ref:DECSM"}DECSM,
https://vt100.net/docs/vt510-rm/SM.html -
[[ref:DECRM]]{#ref:DECRM label="ref:DECRM"}DECRM,
https://vt100.net/docs/vt510-rm/RM.html -
[[ref:UTS-29]]{#ref:UTS-29 label="ref:UTS-29"}UTS 29, Grapheme
segmentation algorithm
https://unicode.org/reports/tr29/#Grapheme_Cluster_Boundary_Rules -
[[ref:UTS-51]]{#ref:UTS-51 label="ref:UTS-51"}UTS 51, Unicode
Emoji https://unicode.org/reports/tr51/#Display, paragraph 2