Skip to content

[core] i18n: Fresh Start #1074

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 27 commits into
base: master
Choose a base branch
from
Open

[core] i18n: Fresh Start #1074

wants to merge 27 commits into from

Conversation

confused-Techie
Copy link
Member

@confused-Techie confused-Techie commented Aug 9, 2024

This PR builds off of the fantastic work by @meadowsys in #715 to attempt getting Pulsar up and running with translation support.

While some aspects of this PR are inspired or directly borrowed from Meadows work, a lot of what was helpful were the initial research stages done, and being able to implement many of those ideas directly.

But this PR now represents a completely functional translatable Pulsar and community packages implementation. But to dive a little deeper, it's important to mention each individual way of text appearing in Pulsar and how we translate it:

  • UI: This is the easiest, able to be translated with the i18n API.
  • Settings: These are translatable via LocaleLabels
  • Context Menu: Translatable via LocaleLabels
  • Menu: Translatable via LocaleLabels
  • Command Palette: Does not support Translation: I've opted to keep the command-palette untouched, since commands are directly created by the names of the commands, in this current form I fear we couldn't implement translations without breaking changes, so I thought it was better to leave an attempt at that as it's own PR completely.

Getting Started

To implement translations in a community project (or Pulsar for that matter) is to simply add a locales folder in the root of the project. Within this folder should be a collection of files for your translations named like so: package-name.locale.json|cson.

When your project is initialized, just like for menus, this file will be read automatically. In the case of Pulsar the i18n.initialize() function reads Pulsar's locales file automatically.

As this file is read it'll take a look at the locale for each file and find the ones that may apply to the user. Because during load we have no awareness of the completeness of each file, we want to load as many as possible, while still making sure not to do wasted work. So we look at all possible languages the user may load at some point, and include any locales that are on that list.

From here all files loaded are available via i18n.strings (although this should not be accessed directly). This key value store is then used to match all translations. Accessible via a keypath like pulsar.context-menu.core:undo.

Translating Strings

For each string that's being translated we have the full support of the ICU Message Syntax, which allows plurals, replacements and much more, this is all provided by Intl-MessageFormat.

Keep in mind that for items that need additional properties passed to them, such as replacement values, these can only be used when translating via the i18n API, as extra properties cannot be passed via LocaleLabels.

Methods of Translation

There's a few different ways that get a string translated, so lets take a look at all of them.

LocaleLabel

In some cases, it's impossible to access the i18n API to translate a string, such as files in your menus directory. Since these files are cson|json files, they cannot run JS code. To translate these items we use what I'm dubbing a LocaleLabel which is simple a keyPath that correlates to a string accessible to the i18n API, such as one stored in your locales directory, that is surrounded by %.

For example, lets say the contents of ./locales/pulsar.en.cson looked like:

'pulsar': {
  'context-menu': {
    'core:undo': 'Undo'
  }
}

And I wanted this string to appear from ./menus/win32.cson:

'context-menu': 
  'atom-text-editor, .overlayer': [
    {label: '%pulsar.context-menu.core:undo%', command: 'core:undo'}
  ]

The above will successfully translate the label of this context menu item when it appears for the user.

The LocaleLabel method of translation is supported in:

  • Config: Only supported on the title and description key.
  • Application Menu: Only supported on the label key.
  • Context Menu: Only supported on the label key.

API

But for all other cases of translation, when we have access to the i18n API we have more freedom.

Lets say we have ./locales/pulsar.en.cson:

'pulsar': {
  'ui': {
    'myString': 'Hello World'
  }
}

The simplest way to translate this would be:

const str = atom.i18n.t("pulsar.ui.myString");

And if we needed to pass any replacements or extra parameters to Intl-MessageFormat we would do that like:

const str = atom.i18n.t("pulsar.ui.myString", opts);

But lets say we wanted easy access to our namespace, or our package's namespace.

const t = atom.i18n.getT("pulsar");

const str = t.t("ui.myString");

This saves us from having to type the full API dozens or hundreds of times as well as the name of the package.

How does a user control translations?

To control translations you simply have two settings:

  • Primary Locale: This would be the most specific primary locale for the user. Such as en-US
  • Priority List: This is a list from highest to lowest of the most specific locales the user would like to see as an alternative.

These two values are used, along with the hardcoded default fallback of en) to construct a list of languages to display to a user. This list is created in accordance of RFC4647 "Lookup Filtering Fallback Pattern".

What this means is that for every single entry, we continuously fallback to less and less specific locales of that language before moving onto the next option.

For example:

  • Primary: es-MX
  • Priority List: zh-Hant-CN, ja-JP
  • Constant hardcoded fallback: en

Our priority list of locales would be:

[
  'es-MX',
  'es',
  'zh-Hant-CN',
  'zh-Hant',
  'zh',
  'ja-JP',
  'ja',
  'en'
]

How is this priority list used?

When we load data from ./locales we only ever load a locale that is present on that above list. If a user had the same list I typed above but there was ./locales/pulsar.ar.cson it would never be loaded by the system at all, because the user wouldn't encounter that language during this fallback list.

But when we ask for any individual translation of a string, no matter if via the API or a LocaleLabel, we find the string we want to translate within i18n.strings first via it's keypath, then iterate through the fallback list and return the first match we get. This means that partial translation is completely supported for every single string, given that there is always a en locale translation available. This is an important point, anytime there is translation the base translations needed that must be 100% translated should be en, not en-US or en-GB or anything else.


From the above you can see I've given tried to cover every use case and made it as easy as possible to partially translate and start small.

These changes are 100% backwards compatible, and support having a single string translated or the entire application.

@confused-Techie confused-Techie marked this pull request as draft August 9, 2024 03:14
@confused-Techie confused-Techie marked this pull request as ready for review August 10, 2024 02:57
@savetheclocktower
Copy link
Contributor

Is there any urgency to get this into 1.120, or are you cool with letting it sit for a while? I think it looks fine at first glance, but I'd love to have a few weeks to play around with it if you're open to that.

@confused-Techie
Copy link
Member Author

@savetheclocktower I'm not against this one waiting around a bit. Obviously it'd be awesome to get it in sooner rather than later, but a month seems completely reasonable

Copy link
Member

@meadowsys meadowsys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

took quick (sleepy) look, looks good to me!

smol sidenote: the work translators have done so far can likely still be used! depending on final format, we may have to "import" manually / write a tool to change the format, but, still usable! that was probably my main worry after I lost motivation on the original one, that translators' work was for nothing ><

Comment on lines +115 to +121
if (Array.isArray(item.submenu)) {
for (let y = 0; y < item.submenu.length; y++) {
if (this.i18n.isAutoTranslateLabel(item.submenu[y].label)) {
item.submenu[y].label = this.i18n.translateLabel(item.submenu[y].label);
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this should handle submenus recursively? (or am I mistaken, and submenus aren't possible / this already handles it?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not a bad call at all, I'm not sure if there's a limit to how many submenus deep this structure could go, but I'll check to see if there's docs on the topic, since we want to make sure we don't forget anything

primary: {
type: "string",
order: 1,
default: "en-US",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think maybe this default should just be en?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are using RFC4647 Lookup Filtering Fallback Pattern it's best for your locale to be as specific as possible. Here if our primary is en-US by default, that'll automatically cover en-US and en, as that'll be the next available fallback language if en-US is not available. The same way that if someone's primary locale was zh-Hant-CN they'd still fallback quickly to just zh if no other locales were available.

So I'm not saying this should be the default necessarily, but whatever it is, is better off being as specific as possible in terms of locale. But since the majority of Atom was developed in English by Americans I assumed this would be the most logical default, as it's what's essentially already expected by most users.

You'll even notice the beginner locale file I made for Pulsar is ./locales/pulsar.en.cson. But that's still what's loaded with the primary locale of en-US due to the fallback behavior.

@confused-Techie
Copy link
Member Author

@meadowsys Good call on importing translations after the fact.

Since after making this PR I did take a look to realize how much has already been translated, and was worried about that work going to waste.

But you are right, pulsar-edit/i18n-intermediate-sync still exists, and we could just put all of this stuff into a new file, then during the step where file structure is modified we could combine the files and move any translated keys to other keys in the final format.

If (hopefully when) we get this merged, I'll get started over there to work all that out

@savetheclocktower
Copy link
Contributor

OK, this deserves a thorough review, but let me brain-dump some ideas in here:

  • The JSON/CSON file approach feels quite similar to how i18n is implemented in VS Code — a bunch of .json files that define strings, then a %magical.percent-based.syntax% to inject them into other JSON files.
  • Seems like, as long as something is designed to pull from a string key programmatically or with the %magical.percent-based.syntax%, any package can define any key. This makes a lot of sense — language support (even for core menus and such) can be added via community packages rather than always needing to land in core. But it does make me wonder what happens in case multiple packages want to provide the same strings. Maybe there's a unit test for it that I missed. I'd expect that latest to load wins, but it could get confusing to the user without a way for them to reason about it. (Not that I have a better idea.)
  • Other than interesting choices with method names (which I address a bit), I don't have any real quibbles about the API. Still, since this is a core API, I wonder if it makes sense to hide this behind a feature flag to start out, or otherwise mark it as experimental and subject to change. I think modern Tree-sitter benefited from a soft-launch like this and I worry a little bit about boxing ourselves in the way we kinda did with atom.ui. If we soft-launch it, we can theoretically play around with the API and work out if there are any pain points or design decisions we want to revisit.

I've steered clear of all the previous i18n discussions because (a) I'm not the target market; and (b) it felt like a minefield. I keep thinking about command names in particular — not just the problem of the command palette (since ultimately we can filter on whatever labels we choose, even if they're localized), but also keymaps, and the awkwardness of having the command's ID be English text no matter what. My instinct was always that, if I were truly more comfortable with a code editor in my hypothetical non-English native language, I'd be tormented by a “half-translated” Pulsar, and that half of the strings would be in English no matter what because of all the unmaintained community packages out there.

Maybe this is too cynical, though. I know that people ask about i18n on a somewhat regular basis, so maybe it's a way to get more folks engaged and contributing.

Copy link
Contributor

@savetheclocktower savetheclocktower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an approval as such, since I haven't tried it out yet — just some questions and suggestions.

Co-authored-by: Andrew Dupont <[email protected]>
@confused-Techie
Copy link
Member Author

@savetheclocktower Thanks a ton for the review, I'll go ahead and start incorporating that feedback.

…ameterized`

This static method had to be renamed since a non-static method of `shouldIncludeLocale` already existed
@confused-Techie
Copy link
Member Author

@savetheclocktower I've gone ahead and converted all static methods to proper camelCase to match our existing code better.

But I'd love to see what you think on the rest of it. To address some points you've made:

LocaleLabels

Like you mentioned the use of %somekey% was based on how VSCode did it. It seems to work pretty well for them, so I see no reason why it couldn't work just as well for us. And really this key structure is how all text would be translated, just a keypath provided for the API to lookup. So really any package can utilize any key they want.

Key Overrides

You do mention concern that any package could override the key path of another or one in core. But this is actually protected against.
While any package can utilize a key from any package or from core, they cannot override them. Package's keys are locked into their own namespace.
This means if my package's name is confused-Techies-cool-package and I have keys defined as:

'pulsar': {
  'ui': {
    'myString': 'Hello World'
  }
}

The the key myString is only accessible via confused-Techies-cool-package.pulsar.ui.myString.

This does ensure that translations are only ever provided as intended by the creator. But you do make a good point, that key overrides could mean faster translation of core, or other packages. So maybe we do want to allow override-able keys? Or make it configurable? It's hard to say, but since a package would have to start using translation anyway, it seems to me the only keys to get overridden would be core, it made sense to restrict this behavior.

Soft-Launch

I'm also not opposed to a soft-launch myself. But the logistics of it do seem a tad more difficult, so I'd be curious on any ideas on how we could do that.

Since to opt into translation for text would mean using the API and having that fill in the results. A soft launch would mean having both options working at the same time, which seems like it would mean having to check each time if the user has opted in, then returning based on that.

I suppose we could change the signature of t/translate to be something like:

t(keyPath, fallbackString?|opts?, opts?)

Where if a string is the second parameter (this would be the original string) then the translate method itself could check what the user has enabled, and if they haven't opted in then we return this fallback string. If they do have it enabled then we can translate the keypath as we would normally, then opts can optionally be provided as the secondary or tertiary parameter depending on if this function is supporting the opt-in fallback behavior.

Would you think something like this would safely support a more soft-launch opt-in behavior? Then maybe we just restrict community packages from it while we soft-launch?

Just like `translateLabel` if we fail to translate a label we should always return the original keyPath so the user knows something is supposed to be there, and hopefully gets a hint at what was intended. Plus hopefully this makes troubleshooting easier/faster to know there was a failure
@savetheclocktower
Copy link
Contributor

I think a soft launch could work like this:

  • Expose an experimental setting to allow users to select a locale. Default would be en-US (or en), since that's the status quo. Next would be “Use OS setting,” followed by a full list of locales. The setting description should make it abundantly clear that this will have minimal effect during the experimental period.
  • Pick one or two builtin packages that have many opportunities for localizable text. (For example: autocomplete-plus has lots configuration settings and descriptions.) Add translations for some common locales for those packages. (We seem to have plenty of users from France and Germany, so those seem like good initial translation targets, but that's just a gut feeling.)
  • Solicit feedback from a handful of users. I think enough European users idle in Discord that we can just ask for volunteers a few times and get some takers.

This would allow us to technically launch the API while making it clear that nobody should rely on it yet for their own packages, since anything would be subject to change.

I think it's fair to be cautious about this, but I definitely don't want it to become a road block or bureaucratic bottleneck. We do have prior art here in VS Code; and as long as we're not leaving out any major features that are present in VS Code's implementation, we can use their experience as proof that this approach is sound.

I think the difference between a soft launch and a full launch could be as little as one or two releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants