Skip to content

Conversation

@zachgarwood
Copy link
Contributor

Description

By default, the method was guessing from a list of all possible encodings what the encoding of the given string was. This change allows the method caller to supply the list of possible encodings. If no encodings are provided, it will use the default_charset setting. See https://www.php.net/manual/en/function.mb-convert-encoding.php.

This is an extremely narrow edge case, but it is possible to encounter an encoding clash, where the given string just so happens to be valid in two different encodings. We currently only have one example, but we have encountered this issue a couple other times in the past, but did not record them.

The string in question is "Tusk". Below, we see that the string was converted to "畔歳" because mb_convert_encoding() guessed it was encoded in UCS-2LE, an obsolete encoding that was superceded by UTF-16:

vagrant@homestead:~/website$ php artisan tinker
Psy Shell v0.12.8 (PHP 8.3.23 — cli) by Justin Hileman
> $str = mb_convert_encoding((string)'Tusk', 'UTF-8', mb_list_encodings());
= "畔歳"
> mb_detect_encoding('Tusk', mb_list_encodings());
= "UCS-2LE"

Under the hood, mb_convert_encoding() uses mb_detect_encoding() to guess the encoding, and from the documentation, we can see that this isn't always reliable:
Screenshot 2025-07-17 at 11 31 14 AM
https://www.php.net/manual/en/function.mb-detect-encoding.php

By default, the method was guessing from a list of all possible encodings what the encoding of the given string was. This change allows the method caller to supply the list of possible encodings. If no encodings are provided, it will use the `default_charset` setting. See https://www.php.net/manual/en/function.mb-convert-encoding.php.
@ifox ifox merged commit 6d622a2 into area17:3.x Aug 20, 2025
14 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants