DOM Localization

A Collection of Interesting Ideas,

This version:
https://nordzilla.github.io/dom-l10n-draft-spec/
Issue Tracking:
GitHub
Editors:
Erik Nordin (Mozilla)
Zibi Braniecki (Unicode)

Abstract

This draft defines a proposal for a web standard for DOM localization. DOM localization is a system that allows authors and users to attach localization resources to structured documents (e.g., HTML documents). By separating the localization resources of documents from the structure, presentation and content, DOM localization enables Web applications to be localizable.

1. Glossary of Terms

Key termoniology used by this specification.

DOM Localization: localization arguments

A set of key-value pairs that serve as arguments to provide the localization context when formatting a message.

Examples in this specification use the keyword l10n-args when annotating DOM elements with localization arguments.

DOM Localization: localization context

A context of localization resources similar to a JavaScript document context, or CSS stylesheet, with the ability to format localized messages.

A context may be declared in HTML and scoped to a document, or may stand alone and be created programmatically via JavaScript.

DOM Localization: localization identifier

An identifier with which to annotate DOM elements and fragments to be localized by a localization context.

Examples in this specification use the keyword l10n-id when annotating DOM elements with a localization identifier.

DOM Localization: localization resource

A file that contains messages that are compatible with MessageFormat 2.0.

Many localization resources may be included in a localization context.

Each supported locale may have its own localized version of a resource.

DOM Localization: message

Localized content that is compatible with MessageFormat 2.0, and that can be looked up by a unique localization identifier.

Messages may contain variables that can be modified at the time of formatting via localization arguments.

Messages belong to a localization resource. A localization resource may contain many messages.

DOM Overlays: text-level elements

A dom-overlays element that operates on text and is always allowed to be inserted into localized content by translators.

e.g. b and em.

DOM Overlays: functional elements

A dom-overlays element for which the developer may specify attributes for translators to add to the element.

e.g. img and atl.

DOM Overlays: structured elements

An element which is part of a nested structure created by the developer that the translator should be able to modify as appropriate.

e.g. ul/ol and li.

MessageFormat 2.0 A new, generic localization system that is being developed for inclusion in Unicode and JavaScript.

(Working Group GitHub)

MessageFormat Syntax in this document is based on the following proposal and is subject to change.

Syntax Proposal

web stack

The composition of HTML, DOM, CSS, and JavaScript and related technologies.

2. Introduction

Note: Many of the ideas expressed in this proposal are inspired by Project Fluent.

The web stack is commonly used to create applications with both content and user interface. One of the core value propositions of the stack is the open, semantic and pluggable nature of the technology.

For example CSS provides technology to apply styles and themes to HTML Documents, but also enables web browsers and third-party addons such as extensions or accessibility tools to adjust the styles and themes at runtime.

In a similar way, DOM localization proposes to introduce a localization component of the web stack. This component would allow for HTML documents to be localized in a way that enables web browsers and third-party extensions to augment documents for the benfit of the user’s experience. Such an open system would allow for construction of localizable web applications that can be introspected for semantic information, be augmented by external code for different forms of presentation (screen readers, VR etc.), and be accessible to the global audience.

3. Relation to MessageFormat 2.0

We propose that the DOM localization system should be built on top of MessageFormat 2.0, providing a unified localization experience for JavaScript applications and HTML documents.

4. Proposed Solution

The proposed solution provides cohesive developer experience between DOM and JavaScript localization, compatible with CSS, ShadowDOM and other aspects of the web stack.

4.1. Localization Context

We propose to introduce a notion of a localization context, similar to JavaScript document context, or CSS stylesheet, which would be composed of a list of localization resources declared in the head of the document.

Similar to stylesheets, developers will be able to programmatically construct any number of localization contexts, as well as declaratively define them for HTML documents, shadow DOM trees etc.

The localization context will likely be an abstraction built upon the Intl.MessageFormat object, which will be specified by MessageFormat 2.0.

4.1.1. Document Localization Context

By default, localization resources decalred in the head element of a document should be aggregated into a localization context that is scoped to that document.

<html>
  <head>
    <link rel="localization" src="uri/for/resource1.mf" />
    <link rel="localization" src="uri/for/resource2.mf" />
  </head>
</html>

4.1.2. Multiple Document Localization Contexts

It could be also possible to allow for multiple explicitly named localization contexts, each with their own localization resources and scopes

<html>
  <head>
    <l10n-context name="menu">
      <link rel="localization" src="uri/for/resource1.mf" />
      <link rel="localization" src="uri/for/resource2.mf" />
    </l10n-context>
    <l10n-context name="chrome">
      <link rel="localization" src="uri/for/resource1.mf" />
      <link rel="localization" src="uri/for/resource3.mf" />
    </l10n-context>
  </head>
</html>

4.2. Localization Core Attributes

We propose to introduce a set of (potentially namespaced) core attributes to HTML that would allow developers to declaratively or programmatically bind DOM elements and fragments to localization messages.

The messages loaded into the document context(s) would then be used to localize the elements or fragments.

4.2.1. Localization Identifier

Assume a document is constructed with a localization context that contains localization resource with a message such as:

key1 = Document Header

Note: This represents a localization resource file using MessageFormat 2.0 syntax that is subject to change.

A document with such a context can then be declaratively written to localize HTML content by augmenting elements with localization identifiers.

<html>
  <body>
    <h1 l10n-id="key1"></h1>
  </body>
</html>

If a localization identifier’s value matches a message contained within the document’s localization context, then the element to which the identifier belongs will be localized accordingly.

The above example would produce the following HTML after translation.

<html>
  <body>
    <h1 l10n-id="key1">Document Header</h1>
  </body>
</html>

4.2.2. Localization Arguments

The second proposed core attribute is a collection of values, similar to dataset. The localization arguments contain variables passed from the developer to the localization context for use when resolving a message:

greetings_msg = Welcome, {$userName}.

Note: This represents a localization resource file using MessageFormat 2.0 syntax that is subject to change.

<html>
  <body>
    <h1 l10n-id="greetings_msg" l10n-args="userName: John"></h1>
  </body>
</html>

The ability to provide a document with a localization context, in addition to the ability to declaratively annotate elements and fragments within the document, provides a basis of the DOM localization system.

The above example would produce the following HTML after translation.

<html>
  <body>
    <h1 l10n-id="greetings_msg" l10n-args="userName: John">Welcome, John</h1>
  </body>
</html>

5. DOM API

In order for the system to be accessible programmatically, DOM localization provides an API to operate on the localization context and its elements.

The main paradigm is a declarative annotation of elements, much like class is used in CSS to bind style classes to element:

let h1 = document.querySelector("h1");
h1.l10n.setAttributes("greetings_msg", { userName: "John" });

This separates the programmatic annotation of the DOM tree with localization bindings, from the application of the localization, which can happen asynchornously and be synced to animation and paint frames.

In cases where the developer needs to operate on localization messages programmatically, users can access the main document’s context to format a message:

let msg = await document.l10n.formatMessage("greetings_msg", {
  userName: "Mary",
});

6. Resource Resolution

In this proposal, the paths provided to the localization contexts are intentionally ambigious, as the mechanism to resolve localization resources is nontrivial and will require a significant amount of up-front design.

The main aspect of the mechanism is that it has to enable the engine to reason about the locales that the user or app requested, and the locales in which localization resources are available. It must be able to negotiate between these two sets to provide an optimal solution for formatting messages in the given localization context.

A potential way to produce sufficient information may look like this:

<html>
  <head>
    <l10n-meta
      available-locales="de, fr, it"
      default-locale="fr"
      path-schema="/static/l10n/{locale}/{resId}"
    />
    <link rel="localization" src="path/to/resource1.mf" />
    <link rel="localization" src="path/to/resource2.mf" />
  </head>
</html>

This would result in the engine taking user requested locales for the app, negotiating them against the listed available locales, and then fetching the localization resources using the resolved path schema.

An example of a fully resolved path may look like the following:

/static/l10n/de-AT/path/to/resource1.mf

7. Custom Contexts

Similar to custom stylesheets, it should be possible to programmatically create custom localization contexts:

let ctx = new LocalizationContext([
  "uri/for/res1.mf",
  "uri/for/res2.mf",
]);
let msg = await ctx.formatMessage("greetings_msg", {
  userName: "Mary",
});

8. Attribute Localization

HTML and Web Components provide elements with translatable attributes such as title and placeholder.

It may be possible to use MessageFormat 2.0 concept of localization groups to cluster a number of localization messages into a group that will be used to declaratively localize an HTML element and its attributes together:

ok_button
=========
content = Click me
title = Title to show on hover
ariaLabel = Main button

Note: This represents a localization resource file using MessageFormat 2.0 syntax that is subject to change.

<html>
  <body>
    <button l10n-id="ok_button"></button>
  </body>
</html>

Such atomic binding between a UI widget and a composed localization unit would be particularly useful for localization of Web Components where rich set of attributes could be used to carry localization messages across the boundary from the document to shadow DOM.

Such binding would also enable locale consistency for the whole UI element ensuring that content and attributes of each element are localized into the same locale, be it primary, or any fallback.

The above example would produce the following HTML after translation.

<html>
  <body>
    <button l10n-id="ok_button" title="Title to show on hover" aria-label="Main button">Click me</button>
  </body>
</html>

9. Web Components

There are two ways to approach web components localization.

9.1. Using Document Localization Context

In this model, the web component exposes a set of attributes that are localizable and the document provides the translations out of its own localization context:

widget1
=======
content = Click me
title = Title to show on hover
ariaLabel = Main widget

Note: This represents a localization resource file using MessageFormat 2.0 syntax that is subject to change.

<html>
  <head>
    <link rel="localization" src="uri/for/resource1.mf" />
    <link rel="localization" src="uri/for/resource2.mf" />
  </head>
  <body>
    <my-widget l10n-id="widget1"/>
  </body>
</html>

The above example would produce the following HTML after translation.

<html>
  <head>
    <link rel="localization" src="uri/for/resource1.mf" />
    <link rel="localization" src="uri/for/resource2.mf" />
  </head>
  <body>
    <my-widget l10n-id="widget1" title="Title to show on hover" aria-label="Main widget">Click me</my-widget>
  </body>
</html>

9.2. Using its own Localization Context

Alternatively, some components may come with their own localization resources and may create their own custom localization contexts to be used for localization of their shadow DOM.

10. L10n Mutations

We propose an extension of a localization context with a number of methods curated for use in context of DOM Fragments.

DOM localization context would have a concept of roots: elements to which special mutation observers are attached.

Such localization contexts would therefore monitor for any changes to the l10n attributes under its roots and react to those mutations by translating the affected elements.

By default, document’s localization context would cover the whole document, but it should be possible to create custom localization contexts.

let ctx = new DOMLocalizationContext([
  "uri/for/res1.mf",
  "uri/for/res2.mf",
]);
let elem = document.getElementById("menu");

// Initial translation of the fragment
await ctx.translateFragment(elem);

ctx.connectRoot(elem);

From the moment connectRoot is called for the given element, any changes to the element or its children that affect localization would be coalesced into a localization frame, which happens right before layout / animation frame.

11. DOM Overlays

Note: Many of the core ideas discussed in this section are inspired by Project Fluent, particularly the following two sources:

We propose that it should be possible for translated messages to refer to or contain DOM elements within the message itself. For example, a localized message in one locale might like to add b elements in appropriate places to make the text bold. This is a delicate and nuanced situation that requires heavy scrutiny, since inserting arbitrary HTML can pose many security risks.

We propose that there should exists three categories of overlay elements:

text-level elements

These are a subset of elements that operate on text and are deemed to be always safe for trnslator to use at any time.

A complete list of text-level elements is defined by WHATWG.

For example, a developer may create a paragraph element such as the following.

<p l10n-id="key1">Message to be localized.<p>

A translator should be able to provide a translation that contains text-level elements if desired.

key1 = This is <em>my</em> localized message.

Note: This represents a localization resource file using MessageFormat 2.0 syntax that is subject to change.

<p>This is <em>my</em> localized message.<p>

At which point, the final HTML would be equivalent to the following localized content in which the em element is interpreted correctly as HTML and rendered accordingly.


functional elements

Functional elements are elements such as img and a that may have attributes for which the developer desires a translator to provide localized translations.

For example, a developer may provide an img as a functional element with a src attribute.

<p l10n-id="key1">
    Hi, <img src="world.png">
</p>

A translator should be able to provide an alt attribute for the img element.

key1 = Hello, <img l10n-name="world" alt="world">!

Note: This represents a localization resource file using MessageFormat 2.0 syntax that is subject to change.

The above example would produce the following HTML after translation.

<p l10n-id="key1">
    Hello, <img src="world.png" alt="world">!
</p>

This example allows the translator to add the alt attribute unconstrained, but it will likely be important to design syntax such that the translator can only add or modify a localized attribute if it is explicitly requested by the developer. This would ensure that the translator cannot override an attribute that is not meant to be localized, such as src or href, unless the translator has explicit permission to do so.


structured elements

While text-level elements are allowed at all times, and functional elements have attributes that may be provided or modified by a translator, a structured element imposes a structure that should be part of the localization.

For example, the developer may want to have ul or ol elements be part of the localization, in which the translator can then add li elements. It should also be possible for the developer nest elements that each have their own unique localization identifiers.

<div l10n-id="key1">
  <ul>
  </ul>
</div>

A translator should then be able to provide a localized message along with li elements.

key1 = This is a localized list:
  <ul>
    <li>Localized item 1</li>
    <li>Localized item 2</li>
    <li>Localized item 3</li>
  </ul>

Note: This represents a localization resource file using MessageFormat 2.0 syntax that is subject to change.

The above example would produce the following HTML after translation.

<div l10n-id="key1">
    This is a localized list:
    <ul>
        <li>Localized item 1</li>
        <li>Localized item 2</li>
        <li>Localized item 3</li>
    </ul>
</div>

11.1. Security Considerations

11.1.1. Specify Localizable Attributes Explicitly

With regard to functional elements, we may want to adopt an opt-in-only model in which a developer must explicitly list which attributes should be localizable.

<p l10n-id="key1" l10n-attrs="alt">
    Hi, <img src="world.png">
</p>

In this model, alt is the only attribute that localizers will be able to modify in translation.

11.1.2. Disabling Translation

Developers should have a means to enforce that elements should not be translated.

This should be expressed with translate attribute specified by ITS.

<div translate="no">
  Always display this sentence.
</div>

11.1.3. Retain Element Identify

<div data-l10n-id="submit-form">
  <button id="submit">Submit</button>
</div>
submit_form = <button>Submit</button>

Note: This represents a localization resource file using MessageFormat 2.0 syntax that is subject to change.

Even after multiple retranslations, the identity of the <button> element should be the same. The element should not be recreated by the localization layer.

11.1.4. Vestigial Content After Retranslation

DOM Overlays should be designed in a way that it is safe to retranslate content without leaving behind a vestige of what was there before. Given that some languages may overlay different HTML elements, we need to ensure that when a language changes and content is retranslated, we are able to keep track of and remove all content that existed due to the previous translation.

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[CSSOM-1]
Daniel Glazman; Emilio Cobos Álvarez. CSS Object Model (CSSOM). 26 August 2021. WD. URL: https://www.w3.org/TR/cssom-1/
[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119