Skip to content

Convert translation properties to XML #6524

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

leonardehrenfried
Copy link
Member

@leonardehrenfried leonardehrenfried commented Mar 10, 2025

Summary

Before this PR we used .properties files in ISO-8859-1 encoding as this used to the the only encoding that you could use when OTP was started. It has been possible for many years to use UTF-8 as the encoding but IntelliJ has the old encoding pretty hardcoded and doesn't auto-detect it. You can configure Intellij but every developer would have to do it.

This leads to the following unfortunate situation:

Screenshot From 2025-03-10 13-47-10

Java can also read properties files in XML encoding which allows you to specify the encoding in the file itself.

This is what I have done with this PR, so it now looks like this:

Screenshot From 2025-03-10 13-46-48

Merging translation bundles

We had one message bundle called WayProperties and one for "internal" stuff. I combined them into a single one called "translations".

@leonardehrenfried leonardehrenfried requested a review from a team as a code owner March 10, 2025 13:09
@leonardehrenfried leonardehrenfried added the Improvement A functional improvement label Mar 10, 2025
Copy link

codecov bot commented Mar 10, 2025

Codecov Report

Attention: Patch coverage is 88.23529% with 2 lines in your changes missing coverage. Please review.

Project coverage is 70.17%. Comparing base (6fd4d48) to head (426fabe).
Report is 26 commits behind head on dev-2.x.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
.../framework/resources/XmlResourceBundleControl.java 86.66% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             dev-2.x    #6524   +/-   ##
==========================================
  Coverage      70.17%   70.17%           
- Complexity     18310    18315    +5     
==========================================
  Files           2082     2083    +1     
  Lines          77217    77228   +11     
  Branches        7832     7832           
==========================================
+ Hits           54185    54198   +13     
- Misses         20259    20260    +1     
+ Partials        2773     2770    -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@optionsome optionsome requested a review from abyrd March 11, 2025 10:25
@t2gran t2gran added this to the 2.8 (next release) milestone Mar 12, 2025
@abyrd
Copy link
Member

abyrd commented Mar 14, 2025

The underlying issue here seems to be displaying and editing properties files (not loading or using them), and this seems to be the only motivation for switching to the XML format. As far as I can see, the sole advantage of the new format is that it can self-describe its encoding. In all other respects it is less convenient to read and edit, at least directly in an editor without tooling, as it contains a lot of tag repetition and angle-bracket line noise. Were there other reasons for preferring XML over properties? @leonardehrenfried I think you mentioned editing tools during the meeting. Do people generally use such tools, and do some of them work better with XML, or are we mostly targeting manual editing in an IDE or text editor?

During the development meeting, I mentioned the possibility of using the Unicode byte order mark (BOM, not to be confused with Bill Of Materials in Maven artifacts) to signal to both the IDE and Java that the file was UTF-8 encoded. There was some discussion of whether this was allowed or appropriate, as UTF-8 does not have any byte order ambiguity. I just checked the Unicode spec and it seems to clearly permit starting a UTF-8 stream with a byte order mark, as a way to show to a high degree of certainty that a file is UTF-8 encoded Unicode. https://www.unicode.org/faq/utf_bom.html#BOM

I tried prepending a UTF-8 BOM to some properties files, and unfortunately by default IntelliJ still tries to decode them as ISO Latin-1. IntelliJ apparently doesn't try to guess or detect the encoding and relies on encoding options in the project settings.

On the consuming side, it appears that one can unambiguously assert the encoding of Properties in code by using the Properties.load method that takes a Reader instead of an InputStream. This is what we do in method CustomDocumentation#loadCustomDocumentationFromPropertiesFile for example. But in other places we're relying on ResourceBundle to do the loading. This should not pose a problem since Java 9 (released in September 2017, so over 8 years ago) and JEP 226 originally proposed in 2014, the default encoding for Properties and so for PropertyResourceBundles is UTF-8. So in fact the underlying problem is not with Java, nor with Properties, but with one specific IDE (IntelliJ) that has incorrect defaults and is assuming an encoding that is fully obsolete.

The OpenTripPlanner project already requires all developers to set certain configuration options for code style, and the repo even contains configuration specific to the IDE in question (https://github.com/opentripplanner/OpenTripPlanner/tree/dev-2.x/.idea). Is it actually prohibitively complex to ask developers to change one setting (properties encoding) in the relatively infrequent case they're going to be editing translations?

Screenshot 2025-03-14 at 15 45 20

Interestingly though, the IntelliJ documentation https://www.jetbrains.com/help/idea/properties-files.html claims that IntelliJ uses UTF-8 by default. Is there something in our own project repo that is causing the IDE to select ISO Latin-1?

Taking a step back, note that the only reason we're even confronting encoding problems is because we're using ResourceBundles and specifically PropertyResourceBundles to supply translations. It would also be possible to define the translations in ListResourceBundles (in Java code that is unambiguously UTF-8) or even in Java Maps or in some extremely simple custom format (e.g. each line is a key, followed by a space, followed by the text).

But the first and simplest line of attack here is to determine why IntelliJ is not applying its own default of UTF-8. Maybe there's something in our Maven POM or other configuration files in the OTP repo.

@leonardehrenfried
Copy link
Member Author

leonardehrenfried commented Mar 14, 2025

I really, really don't want to fiddle around with Intellij settings for this very unimportant feature. I want it to Just Work and not spend more time on it.

If the thought of using XML is so undesirable to you, I will close this PR and it will remain as it is. People will have to find their own solutions.

@abyrd
Copy link
Member

abyrd commented Mar 14, 2025

My sense is that an incorrect default in one IDE does not seem like a good reason to migrate project files to a less common format that is in other ways less convenient to work with.

This is simply a bug in IntelliJ, and a bug can be worked around with a single change in the project settings. The bug is tracked here and I'll add a comment there asking them to fix it:
https://youtrack.jetbrains.com/issue/IDEA-301047/IJ-detect-ISO-8859-1-for-properties-files-instead-of-using-UTF-8-as-default-property-encoding-for-Java-9-and-higher

@abyrd
Copy link
Member

abyrd commented Mar 14, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Improvement A functional improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants