Skip to content

Conversation

@guilhermeljs
Copy link

@guilhermeljs guilhermeljs commented Jan 9, 2026

Fixes #150888

Updates the to_uppercase documentation examples to avoid relying on the ß → "SS" mapping and instead uses a stable multi-character case-mapping example ('ffi' → "FFI").

Note: the example uses U+FB03 (LATIN SMALL LIGATURE FFI), not the ASCII string "ffi".

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jan 9, 2026
@rustbot
Copy link
Collaborator

rustbot commented Jan 9, 2026

r? @Mark-Simulacrum

rustbot has assigned @Mark-Simulacrum.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot

This comment has been minimized.

@guilhermeljs guilhermeljs force-pushed the docs-150888-to_uppercase branch from 504d221 to 9156b54 Compare January 9, 2026 22:55
@rustbot
Copy link
Collaborator

rustbot commented Jan 9, 2026

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

///
/// ```
/// for c in 'ß'.to_uppercase() {
/// for c in ''.to_uppercase() {
Copy link
Member

@Noratrieb Noratrieb Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would make sense to leave a comment explaining that this is a single codepoint? It might be a bit confusing otherwise, as people are likely to not know that these exist (I didn't)

@rust-log-analyzer

This comment has been minimized.

@guilhermeljs guilhermeljs force-pushed the docs-150888-to_uppercase branch from 5e51443 to c10f0dd Compare January 10, 2026 17:22
/// [Unicode Standard]: https://www.unicode.org/versions/latest/
///
/// # Examples
/// `'ffi'` (U+FB03) is a single Unicode code point (a ligature) that maps to "FFI" in uppercase.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hh... hmm. Would it be better to move this comment to a Rust comment inside the example code block so that it's directly visible in-context and likely to be copied with it?

Copy link
Author

@guilhermeljs guilhermeljs Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hh... hmm. Would it be better to move this comment to a Rust comment inside the example code block so that it's directly visible in-context and likely to be copied with it?

@workingjubilee

If we move the note into a Rust comment inside the first example, wouldn't it seem local to that specific snippet? Because here we have three snippets that demonstrate the same semantic fact ('ffi' to "FFI"), just through different APIs (iteration vs println! vs assert). Putting the explanation in only one of them wouldn't make the other appear to lack the key context?, and I feel like duplicating the same comment in all of them would introduce redundancy. (maybe i'm overthinking it, and you have a great point about being likely to be copied with it)

I think keeping it as an introduction note makes it clear that it applies to the entire Examples section. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, my eyes glazed over and failed to fully process the rest of the diff, it seems!

...Yyyyes, I think it's either "once" or "repeated per-instance", and at that point I am less sure of either choice as I think there is still merit of keeping the note close but redundancy is, well, redundancy.

@workingjubilee
Copy link
Member

I was trying to think of a ligature that would be less "what? that's one codepoint?" to readers, because of being more commonly seen, but unfortunately the best example I can think of, 'æ' (still uncommon but everyone sees it due to its popular usage in various creative and "creative" wordmarks) has its big sister 'Æ'.

@workingjubilee
Copy link
Member

workingjubilee commented Jan 12, 2026

( Also that's because it isn't a ligature exactly, except it kinda is, but it also is a Proper Letter, "æsh", retained like þ (thorn) and others. )

@guilhermeljs
Copy link
Author

I was trying to think of a ligature that would be less "what? that's one codepoint?" to readers, because of being more commonly seen, but unfortunately the best example I can think of, 'æ' (still uncommon but everyone sees it due to its popular usage in various creative and "creative" wordmarks) has its big sister 'Æ'.

@workingjubilee
I also looked for another ligature and found a good candidate: "ſt" (U+FB05). It maps to "ST" in uppercase, which is the property the docs tries to demonstrate (The example of a single Unicode character mapping to multiple characters when in uppercase)

What do you think?

@guilhermeljs
Copy link
Author

Also, it's slightly better than the previous ffi ligature. (But in font-rendering it looks like "ft" (FT))

@workingjubilee
Copy link
Member

oh, long S. sure!

@guilhermeljs
Copy link
Author

guilhermeljs commented Jan 12, 2026

I was trying to think of a ligature that would be less "what? that's one codepoint?" to readers, because of being more commonly seen, but unfortunately the best example I can think of, 'æ' (still uncommon but everyone sees it due to its popular usage in various creative and "creative" wordmarks) has its big sister 'Æ'.

Hmmm.. Would the æ code point work in this context? Because it doesn't map to multiple characters when in uppercase ('Æ' is still only one code point), i think the previous example “ß” was used to demonstrate this property

@guilhermeljs
Copy link
Author

So, after some chatting with @workingjubilee on Zulip, I chose to go with the "ſt" ligature because I feel it's less confusing in font-rendering than the "ffi" ligature (which looks exactly like the string "ffi", here using Firefox and Github I can't even tell the difference). The introductory explanation note will be kept as well to make things clearer for readers

@workingjubilee
Copy link
Member

Hmmm.. Would the æ code point work in this context? Because it doesn't map to multiple characters when in uppercase ('Æ' is still only one code point), i think the previous example “ß” was used to demonstrate this property

for clarity: it would not, and my lamentation was basically "alas, any ligature that is used enough becomes considered a 'proper letter' which then means it is given a majuscule variant even if it didn't have one before". the Eszett is counted here, as it is also a ligature based on long S and Z.

As I think this is the most distinctive variation we're getting:

@bors r=Noratrieb,workingjubilee

@rust-bors rust-bors bot added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 13, 2026
@rust-bors
Copy link
Contributor

rust-bors bot commented Jan 13, 2026

📌 Commit 6b5a1a5 has been approved by Noratrieb,workingjubilee

It is now in the queue for this repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stop using the ß->SS capitalization as an example

6 participants