Skip to content

FirstGraphemeCluster does not need to preserve state across grapheme clusters #58

@delthas

Description

@delthas

Hi,

The FirstGraphemeCluster function can be used to iteratively extract grapheme clusters from a string (without additional allocations). The function mentions that a state should be passed (initially set to -1), is then returned and should be passed again on the next call, in order to preserve some state across calls of this function.

This state contains the current grapheme cluster parser state, and the property of the next codepoint.

It did not make sense to me that decoding grapheme cluster depended on earlier state: I'd expected that each grapheme cluster was fully independent.

To test this, I took the full test case for grapheme cluster boundary processing of Unicode 14.0 (the version supported by the library), and ran a simple test by calling FirstGraphemeClusterInString and comparing the results with the spec:

  • When preserving the state across grapheme clusters: everything works (as expected: the library is compliant 😋)
  • When explicitly resetting the state to -1 across calls to FirstGraphemeClusterInString (should be incorrect): everything still works, all tests pass!!!

This would mean that even when not preserving any state, the actual grapheme clusters that are returned are always the same.

So, from my understanding, there shouldn't be the need for any state at all between calls of the library; and the state parameter can be fully deprecated.

Full test case (see the TODO line), try running in the Go playground (prints All tests passed): https://gist.github.com/delthas/0965a2c198b3a114fbb6706435786b73

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions