1. Glossary of Terms
Key termoniology used by this specification.
DOM Localization: localization arguments
A set of key-value pairs that serve as arguments to provide the localization context when formatting a message.
Examples in this specification use the keyword
l10n-args
when annotating DOM elements with localization arguments.
DOM Localization: localization context
A context of localization resources similar to a JavaScript document context, or CSS stylesheet, with the ability to format localized messages.
A context may be declared in HTML and scoped to a document, or may stand alone and be created programmatically via JavaScript.
DOM Localization: localization identifier
An identifier with which to annotate DOM elements and fragments to be localized by a localization context.
Examples in this specification use the keyword
l10n-id
when annotating DOM elements with a localization identifier.
DOM Localization: localization resource
A file that contains messages that are compatible with MessageFormat 2.0.
Many localization resources may be included in a localization context.
Each supported locale may have its own localized version of a resource.
DOM Localization: message
Localized content that is compatible with MessageFormat 2.0, and that can be looked up by a unique localization identifier.
Messages may contain variables that can be modified at the time of formatting via localization arguments.
Messages belong to a localization resource. A localization resource may contain many messages.
DOM Overlays: text-level elements
A dom-overlays element that operates on text and is always allowed to be inserted into localized content by translators.
DOM Overlays: functional elements
A dom-overlays element for which the developer may specify attributes for translators to add to the element.
e.g.
img
andatl
.
DOM Overlays: structured elements
An element which is part of a nested structure created by the developer that the translator should be able to modify as appropriate.
MessageFormat 2.0 A new, generic localization system that is being developed for inclusion in Unicode and JavaScript.
MessageFormat Syntax in this document is based on the following proposal and is subject to change.
web stack
The composition of HTML, DOM, CSS, and JavaScript and related technologies.
2. Introduction
Note: Many of the ideas expressed in this proposal are inspired by Project Fluent.
The web stack is commonly used to create applications with both content and user interface. One of the core value propositions of the stack is the open, semantic and pluggable nature of the technology.
For example CSS provides technology to apply styles and themes to HTML Documents, but also enables web browsers and third-party addons such as extensions or accessibility tools to adjust the styles and themes at runtime.
In a similar way, DOM localization proposes to introduce a localization component of the web stack. This component would allow for HTML documents to be localized in a way that enables web browsers and third-party extensions to augment documents for the benfit of the user’s experience. Such an open system would allow for construction of localizable web applications that can be introspected for semantic information, be augmented by external code for different forms of presentation (screen readers, VR etc.), and be accessible to the global audience.
3. Relation to MessageFormat 2.0
We propose that the DOM localization system should be built on top of MessageFormat 2.0, providing a unified localization experience for JavaScript applications and HTML documents.
4. Proposed Solution
The proposed solution provides cohesive developer experience between DOM and JavaScript localization, compatible with CSS
, ShadowDOM and other aspects of the web stack.
4.1. Localization Context
We propose to introduce a notion of a localization context, similar to JavaScript document context, or CSS stylesheet, which would be composed of a list of localization resources declared in the head
of the document.
Similar to stylesheets, developers will be able to programmatically construct any number of localization contexts, as well as declaratively define them for HTML documents, shadow DOM trees etc.
The localization context will likely be an abstraction built upon the Intl.MessageFormat object, which will be specified by MessageFormat 2.0.
4.1.1. Document Localization Context
By default, localization resources decalred in the head
element of a document should be aggregated into a localization context that is scoped to that document.
< html > < head > < link rel = "localization" src = "uri/for/resource1.mf" /> < link rel = "localization" src = "uri/for/resource2.mf" /> </ head > </ html >
4.1.2. Multiple Document Localization Contexts
It could be also possible to allow for multiple explicitly named localization contexts, each with their own localization resources and scopes
< html > < head > < l10n-context name = "menu" > < link rel = "localization" src = "uri/for/resource1.mf" /> < link rel = "localization" src = "uri/for/resource2.mf" /> </ l10n-context > < l10n-context name = "chrome" > < link rel = "localization" src = "uri/for/resource1.mf" /> < link rel = "localization" src = "uri/for/resource3.mf" /> </ l10n-context > </ head > </ html >
4.2. Localization Core Attributes
We propose to introduce a set of (potentially namespaced) core attributes to HTML that would allow developers to declaratively or programmatically bind DOM elements and fragments to localization messages.
The messages loaded into the document context(s) would then be used to localize the elements or fragments.
4.2.1. Localization Identifier
Assume a document is constructed with a localization context that contains localization resource with a message such as:
key1 = Document Header
Note: This represents a localization resource file using MessageFormat 2.0 syntax that is subject to change.
A document with such a context can then be declaratively written to localize HTML content by augmenting elements with localization identifiers.
< html > < body > < h1 l10n-id = "key1" ></ h1 > </ body > </ html >
If a localization identifier’s value matches a message contained within the document’s localization context, then the element to which the identifier belongs will be localized accordingly.
The above example would produce the following HTML after translation.
< html > < body > < h1 l10n-id = "key1" > Document Header</ h1 > </ body > </ html >
4.2.2. Localization Arguments
The second proposed core attribute is a collection of values, similar to dataset
. The localization arguments contain variables passed from the developer to
the localization context for use when resolving a message:
greetings_msg = Welcome, {$userName}.
Note: This represents a localization resource file using MessageFormat 2.0 syntax that is subject to change.
< html > < body > < h1 l10n-id = "greetings_msg" l10n-args = "userName: John" ></ h1 > </ body > </ html >
The ability to provide a document with a localization context, in addition to the ability to declaratively annotate elements and fragments within the document, provides a basis of the DOM localization system.
The above example would produce the following HTML after translation.
< html > < body > < h1 l10n-id = "greetings_msg" l10n-args = "userName: John" > Welcome, John</ h1 > </ body > </ html >
5. DOM API
In order for the system to be accessible programmatically, DOM localization provides an API to operate on the localization context and its elements.
The main paradigm is a declarative annotation of elements, much like class
is used in CSS to bind style classes to element:
let h1= document. querySelector( "h1" ); h1. l10n. setAttributes( "greetings_msg" , { userName: "John" });
This separates the programmatic annotation of the DOM tree with localization bindings, from the application of the localization, which can happen asynchornously and be synced to animation and paint frames.
In cases where the developer needs to operate on localization messages programmatically, users can access the main document’s context to format a message:
let msg= await document. l10n. formatMessage( "greetings_msg" , { userName: "Mary" , });
6. Resource Resolution
In this proposal, the paths provided to the localization contexts are intentionally ambigious, as the mechanism to resolve localization resources is nontrivial and will require a significant amount of up-front design.
The main aspect of the mechanism is that it has to enable the engine to reason about the locales that the user or app requested, and the locales in which localization resources are available. It must be able to negotiate between these two sets to provide an optimal solution for formatting messages in the given localization context.
A potential way to produce sufficient information may look like this:
< html > < head > < l10n-meta available-locales = "de, fr, it" default-locale = "fr" path-schema = "/static/l10n/{locale}/{resId}" /> < link rel = "localization" src = "path/to/resource1.mf" /> < link rel = "localization" src = "path/to/resource2.mf" /> </ head > </ html >
This would result in the engine taking user requested locales for the app, negotiating them against the listed available locales, and then fetching the localization resources using the resolved path schema.
An example of a fully resolved path may look like the following:
/static/l10n/de-AT/path/to/resource1.mf
7. Custom Contexts
Similar to custom stylesheets, it should be possible to programmatically create custom localization contexts:
let ctx= new LocalizationContext([ "uri/for/res1.mf" , "uri/for/res2.mf" , ]); let msg= await ctx. formatMessage( "greetings_msg" , { userName: "Mary" , });
8. Attribute Localization
HTML and Web Components provide elements with translatable attributes such as title
and placeholder
.
It may be possible to use MessageFormat 2.0 concept of localization groups to cluster a number of localization messages into a group that will be used to declaratively localize an HTML element and its attributes together:
ok_button = ======== content = Click me title = Title to show on hover ariaLabel = Main button
Note: This represents a localization resource file using MessageFormat 2.0 syntax that is subject to change.
< html > < body > < button l10n-id = "ok_button" ></ button > </ body > </ html >
Such atomic binding between a UI widget and a composed localization unit would be particularly useful for localization of Web Components where rich set of attributes could be used to carry localization messages across the boundary from the document to shadow DOM.
Such binding would also enable locale consistency for the whole UI element ensuring that content and attributes of each element are localized into the same locale, be it primary, or any fallback.
The above example would produce the following HTML after translation.
< html > < body > < button l10n-id = "ok_button" title = "Title to show on hover" aria-label = "Main button" > Click me</ button > </ body > </ html >
9. Web Components
There are two ways to approach web components localization.
9.1. Using Document Localization Context
In this model, the web component exposes a set of attributes that are localizable and the document provides the translations out of its own localization context:
widget1 = ====== content = Click me title = Title to show on hover ariaLabel = Main widget
Note: This represents a localization resource file using MessageFormat 2.0 syntax that is subject to change.
< html > < head > < link rel = "localization" src = "uri/for/resource1.mf" /> < link rel = "localization" src = "uri/for/resource2.mf" /> </ head > < body > < my-widget l10n-id = "widget1" /> </ body > </ html >
The above example would produce the following HTML after translation.
< html > < head > < link rel = "localization" src = "uri/for/resource1.mf" /> < link rel = "localization" src = "uri/for/resource2.mf" /> </ head > < body > < my-widget l10n-id = "widget1" title = "Title to show on hover" aria-label = "Main widget" > Click me</ my-widget > </ body > </ html >
9.2. Using its own Localization Context
Alternatively, some components may come with their own localization resources and may create their own custom localization contexts to be used for localization of their shadow DOM.
10. L10n Mutations
We propose an extension of a localization context with a number of methods curated for use in context of DOM Fragments.
DOM localization context would have a concept of roots
: elements to which special mutation observers are attached.
Such localization contexts would therefore monitor for any changes to the l10n attributes under its roots and react to those mutations by translating the affected elements.
By default, document’s localization context would cover the whole document, but it should be possible to create custom localization contexts.
let ctx= new DOMLocalizationContext([ "uri/for/res1.mf" , "uri/for/res2.mf" , ]); let elem= document. getElementById( "menu" ); // Initial translation of the fragment await ctx. translateFragment( elem); ctx. connectRoot( elem);
From the moment connectRoot
is called for the given element, any changes to the element or its children that affect localization would be coalesced into a localization frame, which happens right before layout / animation frame.
11. DOM Overlays
Note: Many of the core ideas discussed in this section are inspired by Project Fluent, particularly the following two sources:
The DOM Overlays wiki, written by Staś Małolepszy.
The ideas for New Features (rev 3), written by Zibi Braniecki
We propose that it should be possible for translated messages to refer to or contain DOM elements within the message itself.
For example, a localized message in one locale might like to add b
elements in appropriate places to make the text bold.
This is a delicate and nuanced situation that requires heavy scrutiny, since inserting arbitrary HTML can pose many security risks.
We propose that there should exists three categories of overlay elements:
These are a subset of elements that operate on text and are deemed to be always safe for trnslator to use at any time.
A complete list of text-level elements is defined by WHATWG.
For example, a developer may create a paragraph element such as the following.
< p l10n-id = "key1" > Message to be localized.< p >
A translator should be able to provide a translation that contains text-level elements if desired.
key1 = This is <em>my</em> localized message.
Note: This represents a localization resource file using MessageFormat 2.0 syntax that is subject to change.
< p > This is< em > my</ em > localized message.< p >
At which point, the final HTML would be equivalent to the following localized content in which the em
element is interpreted correctly as HTML and rendered accordingly.
Functional elements are elements such as
img
anda
that may have attributes for which the developer desires a translator to provide localized translations.
For example, a developer may provide an img
as a functional element with a src
attribute.
< p l10n-id = "key1" > Hi,< img src = "world.png" > </ p >
A translator should be able to provide an alt
attribute for the img
element.
key1 = Hello, <img l10n-name="world" alt="world">!
Note: This represents a localization resource file using MessageFormat 2.0 syntax that is subject to change.
The above example would produce the following HTML after translation.
< p l10n-id = "key1" > Hello,< img src = "world.png" alt = "world" > !</ p >
This example allows the translator to add the alt
attribute unconstrained, but
it will likely be important to design syntax such that the translator can only add or modify
a localized attribute if it is explicitly requested by the developer. This would ensure that the translator
cannot override an attribute that is not meant to be localized, such as src
or href
, unless
the translator has explicit permission to do so.
While text-level elements are allowed at all times, and functional elements have attributes that may be provided or modified by a translator, a structured element imposes a structure that should be part of the localization.
For example, the developer may want to have ul
or ol
elements be part of the localization, in which the translator
can then add li
elements. It should also be possible for the developer nest elements that each have their own unique localization identifiers.
< div l10n-id = "key1" > < ul > </ ul > </ div >
A translator should then be able to provide a localized message along with li
elements.
key1 = This is a localized list: <ul> <li>Localized item 1</li> <li>Localized item 2</li> <li>Localized item 3</li> </ul>
Note: This represents a localization resource file using MessageFormat 2.0 syntax that is subject to change.
The above example would produce the following HTML after translation.
< div l10n-id = "key1" > This is a localized list:< ul > < li > Localized item 1</ li > < li > Localized item 2</ li > < li > Localized item 3</ li > </ ul > </ div >
11.1. Security Considerations
11.1.1. Specify Localizable Attributes Explicitly
With regard to functional elements, we may want to adopt an opt-in-only model in which a developer must explicitly list which attributes should be localizable.
< p l10n-id = "key1" l10n-attrs = "alt" > Hi,< img src = "world.png" > </ p >
In this model, alt
is the only attribute that localizers will be able to modify in translation.
11.1.2. Disabling Translation
Developers should have a means to enforce that elements should not be translated.
This should be expressed with translate attribute specified by ITS.
< div translate = "no" > Always display this sentence.</ div >
11.1.3. Retain Element Identify
< div data-l10n-id = "submit-form" > < button id = "submit" > Submit</ button > </ div >
submit_form = <button>Submit</button>
Note: This represents a localization resource file using MessageFormat 2.0 syntax that is subject to change.
Even after multiple retranslations, the identity of the <button>
element should be the same.
The element should not be recreated by the localization layer.
11.1.4. Vestigial Content After Retranslation
DOM Overlays should be designed in a way that it is safe to retranslate content without leaving behind a vestige of what was there before. Given that some languages may overlay different HTML elements, we need to ensure that when a language changes and content is retranslated, we are able to keep track of and remove all content that existed due to the previous translation.