-
Notifications
You must be signed in to change notification settings - Fork 21
first try fixing utf8 issues #52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This code makes no sense:
|
On 19/08/16 21:59, Karen Etheridge wrote:
But it is not in utf8. And you can't make it use utf8, or the Module::Install generated will be |
Dear @karenetheridge, looks more understandable this way? |
Also, added some code to check Perl version. |
Indeed, it is incorrect. |
The correct fix would be just call utf8 was shipped with 5.6.0 and there's no problem requiring that, given Module::Install's lowest supported perl is also 5.6.0. |
btw #37 sounds like a different problem then - it should decode when reading the |
@miyagawa, removing the utf8 check makes t/20_authors_with_special_characters.t fail. |
Haven't looked at the test but that sounds right. This can't essentially be done without a breaking change... |
The only sane way to use unicode in perl is to handle the data consistently as either bytes or characters. Anything else will set everything on fire. And while I generally consider threating strings as characters the more sensible option, the path of least resistance here is probably threating everything as bytes. |
How would this change affect M::I users? I am never sure what the encode/decode stuff will affect the string. |
@Leont you say that as your name doesn't have accented characters 😺 |
We've discussed in CPAN-Meta tickets that it's more important to avoid breaking installations than to corrupt an author or contributor's name. However, different rules can apply in MI since the code is bundled with the distribution, so if breakage occurs, it's all on the author side, and they are in a position to fix their Makefile.PL before shipping. Given this, I think it would be acceptable to die with "invalid character encoding" errors and force the author to fix their code (especially if it's easy to do so, and we have something in the documentation explaining how) when they upgrade to the latest MI. ...especially since we don't really want authors to continue to use MI anyway.. :) |
To summarize the current situation:
These mismatches cause mojibake. The first suggested solution was essentially "sometimes we interpret the output as bytes, sometimes we don't", which is a path to madness. The second suggested solution is to treat all files (on input and on output) as UTF-8. Which will actually work as long as those conditions are true, but I bet it often isn't. The sensible solution is a combination of:
|
Yes, that was my suggestion. I forgot that the author names are often read from file with |
If I read it correctly, for M::I the solution is keep the code "quiet" as it was? |
This should fix #37 and probably other similar issues with characters outside the latin1 range, without risking any breakage elsewhere involving these read/write functions. |
OK, will try to see if I understood it. |
Something like #55? |
This is a first try fixing a utf8 problem from #37.
This is not really a pull request, but a request for comments. This solves the issue, but will only work for perl >= 5.8.1
What happens is that when Perl reads the Makefile.PL, with utf8, it will store strings as utf8. Then, you will not be able to write them as bytes, unless you perform the right conversion.
btw, this is a Pull Request Challenge PR.