Skip to content

Clarify: RegExp scf meaning #3594

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Jack-Works opened this issue May 11, 2025 · 0 comments
Open

Clarify: RegExp scf meaning #3594

Jack-Works opened this issue May 11, 2025 · 0 comments

Comments

@Jack-Works
Copy link
Member

Jack-Works commented May 11, 2025

There are two different wordings when mentioning the Simple Case Folding:

Canonicalize:

If the file CaseFolding.txt of the Unicode Character Database provides a simple or common case folding mapping for ch, ...

scf:

the Simple Case Folding (scf(cp)) definitions in the file CaseFolding.txt of the Unicode Character Database

The Unicode file mentioned above writes:

# The status field is:
# C: common case folding, common mappings shared by both simple and full mappings.
# F: full case folding, mappings that cause strings to grow in length. Multiple characters are separated by spaces.
# S: simple case folding, mappings to single characters where different from F.
# T: special case for uppercase I and dotted uppercase I
#    - For non-Turkic languages, this mapping is normally not used.
#    - For Turkic languages (tr, az), this mapping can be used instead of the normal mapping for these characters.
#      Note that the Turkic mappings do not maintain canonical equivalence without additional processing.
#      See the discussions of case mapping in the Unicode Standard for more information.
#
# Usage:
#  A. To do a simple case folding, use the mappings with status C + S.
#  B. To do a full case folding, use the mappings with status C + F.
#
#    The mappings with status T can be used or omitted depending on the desired case-folding
#    behavior. (The default option is to exclude them.)

The wording in Canonicalize is clear, it is C + S. I'm not sure if scf also refers to C + S (by the Usage comment), or it's just S (by the status field).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant