Prevent HasSlug::getUtf8Slug() from guessing encoding by default #2771
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
By default, the method was guessing from a list of all possible encodings what the encoding of the given string was. This change allows the method caller to supply the list of possible encodings. If no encodings are provided, it will use the
default_charsetsetting. See https://www.php.net/manual/en/function.mb-convert-encoding.php.This is an extremely narrow edge case, but it is possible to encounter an encoding clash, where the given string just so happens to be valid in two different encodings. We currently only have one example, but we have encountered this issue a couple other times in the past, but did not record them.
The string in question is "Tusk". Below, we see that the string was converted to "畔歳" because
mb_convert_encoding()guessed it was encoded in UCS-2LE, an obsolete encoding that was superceded by UTF-16:Under the hood,

mb_convert_encoding()usesmb_detect_encoding()to guess the encoding, and from the documentation, we can see that this isn't always reliable:https://www.php.net/manual/en/function.mb-detect-encoding.php