-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Hi all,
Following on from the text-sizing work in #8226 I have decided to specify the exact algorithm terminals should use to split Unicode text into cells and implement it in kitty. It is based on the Unicode specification's Grapheme segmentation rules but in addition also specifies things particular to terminals not covered in the Unicode spec. It fixes various long standing issues such as #3810 (emoji with zwj) and #8433 (Korean text).
The specification is here. Feel free to read it and comment. There might well be differing opinions on the parts not covered by the Unicode spec. I am open to suggestions for modification.
From kitty users, I would appreciate if some of you can run nightly and report if there are any issues. It's possible you will have issues if you use ZWJ based emoji in your workflows, as the width kitty assigns to these has changed, and terminal programs may use a different width than the correct one.
In master, there is also a kitten that can be run easily to test a terminal's compliance with the spec. It uses grapheme test data from the Unicode consortium. Run it as:
kitten __width_test__
Here are results of running it on various terminals.
Terminal name | Number of tests failed |
---|---|
kitty (master) | 0 |
kitty 0.41.1 | 45 |
wezterm 5046fc22 | 179 |
foot 1.21.0 | 186 |
konsole 24.12.3 | 280 |
iTerm2 3.5.13 | 289 |
gnome-term 3.56.0 | 317 |
kitty-master+tmux-3.5 | 347 |
xterm 397 | 371 |
Apple terminal 2.14 | 479 |
And finally, we have ghostty 1.1.3, on which the test kitten failed to run because ghostty returned way more cursor position reports than were expected, something badly broken there. I did happen to look at its code as it claims to do grapheme segmentation, and it doesn't implement the segmentation algorithm correctly anyway.