You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the current implementation of satori, the CSS wordBreak mechanism is simulated by combining Intl.Segmenter and linebreak. their corresponding relationships are as follows:
CSS
Corresponding Implementation
word-break: break-all
Intl.segment(content, 'grapheme')
word-break: keep-all
Intl.segment(content, 'word')
Any others or unset
linebreak
Intl.segment solves most of the problems, but it is helpless against the default word-break behavior of CSS. We hope that the word-break in satori can conform to the latest Unicode standards/test cases by default, to simulate the corresponding behavior of CSS on browsers as much as possible.
Writing a complete JavaScript implementation for Unicode Line Breaking Algorithm UAX #14 is a very laborious job. Since Unicode updates its standards and test cases every year, developers must continuously update and maintain it. linebreak currently supports Unicode 14.0, and many other libraries such as css-line-break have also stopped updating. It's even difficult to find a complete JavaScript implementation of UAX#14 that fully supports Unicode 16 on npm or GitHub.
However, satori's word segmentation and line breaking, especially the correct handling of newly added emojis, heavily rely on the latest Unicode standards. Is there a more suitable solution that can help us avoid compatibility issues caused by Unicode standard updates?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
In the current implementation of satori, the CSS
wordBreak
mechanism is simulated by combiningIntl.Segmenter
and linebreak. their corresponding relationships are as follows:word-break: break-all
Intl.segment(content, 'grapheme')
word-break: keep-all
Intl.segment(content, 'word')
linebreak
Intl.segment
solves most of the problems, but it is helpless against the defaultword-break
behavior of CSS. We hope that theword-break
in satori can conform to the latest Unicode standards/test cases by default, to simulate the corresponding behavior of CSS on browsers as much as possible.Writing a complete JavaScript implementation for Unicode Line Breaking Algorithm UAX #14 is a very laborious job. Since Unicode updates its standards and test cases every year, developers must continuously update and maintain it.
linebreak currently supports Unicode 14.0, and many other libraries such as
css-line-break
have also stopped updating. It's even difficult to find a complete JavaScript implementation of UAX#14 that fully supports Unicode 16 on npm or GitHub.However, satori's word segmentation and line breaking, especially the correct handling of newly added emojis, heavily rely on the latest Unicode standards. Is there a more suitable solution that can help us avoid compatibility issues caused by Unicode standard updates?
Beta Was this translation helpful? Give feedback.
All reactions