Better and sustainable `wordBreak` support #683

Vizards · 2025-06-03T10:57:33Z

Vizards
Jun 3, 2025

In the current implementation of satori, the CSS wordBreak mechanism is simulated by combining Intl.Segmenter and linebreak. their corresponding relationships are as follows:

CSS	Corresponding Implementation
`word-break: break-all`	`Intl.segment(content, 'grapheme')`
`word-break: keep-all`	`Intl.segment(content, 'word')`
Any others or unset	`linebreak`

Intl.segment solves most of the problems, but it is helpless against the default word-break behavior of CSS. We hope that the word-break in satori can conform to the latest Unicode standards/test cases by default, to simulate the corresponding behavior of CSS on browsers as much as possible.

Writing a complete JavaScript implementation for Unicode Line Breaking Algorithm UAX #14 is a very laborious job. Since Unicode updates its standards and test cases every year, developers must continuously update and maintain it.
linebreak currently supports Unicode 14.0, and many other libraries such as css-line-break have also stopped updating. It's even difficult to find a complete JavaScript implementation of UAX#14 that fully supports Unicode 16 on npm or GitHub.

However, satori's word segmentation and line breaking, especially the correct handling of newly added emojis, heavily rely on the latest Unicode standards. Is there a more suitable solution that can help us avoid compatibility issues caused by Unicode standard updates?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Better and sustainable `wordBreak` support #683

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Better and sustainable wordBreak support #683

Uh oh!

Vizards Jun 3, 2025

Replies: 0 comments

Better and sustainable `wordBreak` support #683

Vizards
Jun 3, 2025