Skip to content

Commit

Permalink
Update tr35-general.md
Browse files Browse the repository at this point in the history
  • Loading branch information
macchiati authored Nov 1, 2024
1 parent f1aa656 commit 8c4591e
Showing 1 changed file with 150 additions and 149 deletions.
299 changes: 150 additions & 149 deletions docs/ldml/tr35-general.md
Original file line number Diff line number Diff line change
Expand Up @@ -902,157 +902,158 @@ As with other identifiers in CLDR, the American English spelling is used for uni

> In keeping with U.S. and International practice (see Sec. C.2), this Guide uses the dot on the line as the decimal marker. In addition this Guide utilizes the American spellings “meter,” “liter,” and “deka” rather than “metre,” “litre,” and “deca,” and the name “metric ton” rather than “tonne.”
#### Syntax
<a name="syntax"></a>
#### Unit Syntax

The formal syntax for identifiers is provided below.
The formal syntax for identifiers is provided below, in [EBNF](tr35.md#ebnf).
Some of the constraints reference data from the unitIdComponents in [Unit_Conversion](tr35-info.md#Unit_Conversion).

<!-- HTML: no header -->

<table><tbody>
<tr><td><a name='unit_identifier' href='#unit_identifier'>unit_identifier</a></td><td>:=</td>
<td>core_unit_identifier<br/>
| mixed_unit_identifier<br/>
| long_unit_identifier</td></tr>

<tr><td><a name='core_unit_identifier' href='#core_unit_identifier'>core_unit_identifier</a></td><td>:=</td>
<td>product_unit ("-" per "-" product_unit)*<br/>
| per "-" product_unit ("-" per "-" product_unit)*
<ul><li><em>Examples:</em>
<ul><li>foot-per-second-per-second</li>
<li>per-second</li>
</ul></li>
<li><em>Note:</em> The normalized form will have only one "per"</li>
</ul></td></tr>

<tr><td>per</td><td>:=</td>
<td>"per"
<ul>
<li><em>Constraint:</em> The token 'per' is the single value in &lt;unitIdComponent type="per"&gt;</li>
</ul></td></tr>

<tr><td><a name='product_unit' href='#product_unit'>product_unit</a></td><td>:=</td>
<td>single_unit ("-" single_unit)* ("-" pu_single_unit)*<br/>
| pu_single_unit ("-" pu_single_unit)*
<ul><li><em>Example:</em> foot-pound-force</li>
<li><em>Constraint:</em> No pu_single_unit may precede a single unit</li>
</ul></td></tr>

<tr><td><a name='single_unit' href='#single_unit'>single_unit</a></td><td>:=</td>
<td>dimensionality_prefix? simple_unit | unit_constant
<ul><li><em>Examples: </em>square-kilometer, or 100</li></ul></td></tr>

<tr><td><a name='pu_single_unit' href='#pu_single_unit'>pu_single_unit</a></td><td>:=</td>
<td>"xxx-" single_unit | "x-" single_unit
<ul><li><em>Example:</em> xxx-square-knuts (a Harry Potter unit)</li>
<li><em>Note:</em> "x-" is only for backwards compatibility</li>
<li>See <a href="#Private_Use_Units">Private-Use Units</a></li>
</ul></td></tr>

<tr><td><a name='unit_constant' href='#unit_constant'>unit_constant</a></td><td>:=</td>
<td>[1-9][0-9]* ("e" [1-9][0-9]*)?
<ul><li><em>Examples:</em>
<ul><li>kilowatt-hour-per-100-kilometer</li>
<li>gallon-per-100-mile</li>
<li>per-200-pound</li>
<li>per-12</li>
</ul></li>
<li><em>Constraint:</em> The numeric value of the unit constant must be an integer greater than one.</li>
<li><em>Note:</em> The normal interpretation of <code>e</code> is used, where 2e6 = 2×10⁶.</li>
<li><em>Note:</em> The <code>e</code> notation is optional: per-100-kilometer and per-1e2-kilometer are equivalent unit_identifiers.</li>
<li><em>Note:</em> When constructing identifiers, exponents should be greater than 3 and multiples of 3, even though parsers must accept the wider range.</li>
</ul></td></tr>

<tr><td><a name='dimensionality_prefix' href='#dimensionality_prefix'>dimensionality_prefix</a></td><td>:=</td>
<td>"square-"<p>| "cubic-"<p>| "pow" ([2-9]|1[0-5]) "-"
<ul>
<li><em>Constraint:</em> must be value in: &lt;unitIdComponent type="power"&gt;.</li>
<li><em>Note:</em> "pow2-" and "pow3-" canonicalize to "square-" and "cubic-"</li>
<li><em>Note:</em> These are values in &lt;unitIdComponent type="power"&gt;</li>
</ul></td></tr>

<tr><td><a name='simple_unit' href='#simple_unit'>simple_unit</a></td><td>:=</td>
<td>(prefix_component "-")* (prefixed_unit | base_component) ("-" suffix_component)*<br/>
| currency_unit<br/>
| "em" | "g" | "us" | "hg" | "of"
<ul>
<li><em>Examples:</em> kilometer, meter, cup-metric, fluid-ounce, curr-chf, em</li>
<li><em>Note:</em> Three simple units are currently allowed as legacy usage, for tokens that wouldn’t otherwise be a base_component due to length (eg, "<strong>g</strong>-force").
We will likely deprecate those and add conformant aliases in the future: the "hg" and "of" are already only in deprecated simple_units.</li>
</ul></td></tr>

<tr><td><a name='prefixed_unit' href='#prefixed_unit'>prefixed_unit</a></td><td></td>
<td>prefix base_component<ul><li><em>Example: </em>kilometer</li></ul></td></tr>

<tr><td><a name='prefix' href='#prefix'>prefix</a></td><td></td>
<td>si_prefix | binary_prefix</td></tr>

<tr><td><a name='si_prefix' href='#si_prefix'>si_prefix</a></td><td>:=</td>
<td>"deka" | "hecto" | "kilo", …
<ul><li><em>Constraint:</em> Must be an attribute value of the <code>type</code> in: &lt;unitPrefix type='…' … power10='…'&gt;.
See also <a href="https://www.nist.gov/pml/special-publication-811">NIST special publication 811</a></li></ul></td></tr>

<tr><td><a name='binary_prefix' href='#binary_prefix'>binary_prefix</a></td><td>:=</td>
<td>"kibi", "mebi", …
<ul><li><em>Constraint:</em> Must be an attribute value of the <code>type</code> in: &lt;unitPrefix type='…' … power2='…'&gt;.
See also <a href="https://physics.nist.gov/cuu/Units/binary.html">Prefixes for binary multiples</a></li></ul></td></tr>

<tr><td><a name='prefix_component' href='#prefix_component'>prefix_component</a></td><td>:=</td>
<td>[a-z]{3,∞}
<ul><li><em>Constraint:</em> must be value in: &lt;unitIdComponent type="prefix"&gt;.</li></ul></td></tr>

<tr><td><a name='base_component' href='#base_component'>base_component</a></td><td>:=</td>
<td>[a-z]{3,∞}
<ul><li><em>Constraint:</em> must not be a value in any of the following:<br>
&lt;unitIdComponent type="prefix"&gt;<br>
or &lt;unitIdComponent type="suffix"&gt; <br>
or &lt;unitIdComponent type="power"&gt;<br>
or &lt;unitIdComponent type="and"&gt;<br>
or &lt;unitIdComponent type="per"&gt;.
</li>
<li><em>Constraint:</em> must not have a prefix as an initial segment.</li>
<li><em>Constraint:</em> no two different base_components will share the first 8 letters.
(<b>For more information, see <a href="#Unit_Identifier_Uniqueness">Unit Identifier Uniqueness</a>.)</b>
</li>
</ul>
</td></tr>

<tr><td><a name='suffix_component' href='#suffix_component'>suffix_component</a></td><td>:=</td>
<td>[a-z]{3,∞}
<ul>
<li><em>Constraint:</em> must be value in: &lt;unitIdComponent type="suffix"&gt;</li>
</ul></td></tr>

<tr><td><a name='mixed_unit_identifier' href='#mixed_unit_identifier'></a></td><td>:=</td>
<td>(single_unit | pu_single_unit) ("-" and "-" (single_unit | pu_single_unit ))*
<ul><li><em>Example: foot-and-inch</em></li>
</ul></td></tr>

<tr><td>and</td><td>:=</td>
<td>"and"
<ul>
<li><em>Constraint:</em> The token 'and' is the single value in &lt;unitIdComponent type="and"&gt;</li>
</ul></td></tr>

<tr><td><a name='long_unit_identifier' href='#long_unit_identifier'>long_unit_identifier</a></td><td>:=</td>
<td>grouping "-" core_unit_identifier</td></tr>

<tr><td>grouping</td><td>:=</td>
<td>[a-z]{3,∞}</td></tr>

<tr><td><a name='currency_unit' href='#currency_unit'>currency_unit</a></td><td>:=</td>
<td>"curr-" [a-z]{3}
<ul>
<li><em>Constraint:</em> The first part of the currency_unit is a standard prefix; the second part of the currency unit must be a valid <a href="tr35.md#UnicodeCurrencyIdentifier">Unicode currency identifier</a>.</li>
</ul>
<ul>
<li><em>Examples:</em> <b>curr-eur</b>-per-square-meter, or pound-per-<b>curr-usd</b></li>
<li><em>Note:</em> CLDR does not provide conversions for currencies; this is only intended for formatting.
The locale data for currencies is supplied in the <code>currencies</code> element, not in the <code>units</code> element.</li>
</ul>
</td></tr>

</tbody></table>
<a name='unit_identifier' href='#unit_identifier'>unit_identifier</a>
<br/>:= core_unit_identifier
<br/>   | mixed_unit_identifier
<br/>   | long_unit_identifier

<a name='core_unit_identifier' href='#core_unit_identifier'>core_unit_identifier</a>
<br/>:= product_unit ("-" per "-" product_unit)\*
<br/>   | per "-" product_unit ("-" per "-" product_unit)\*
* *Examples:*
* foot-per-second-per-second
* per-second
* *Notes:*
* The normalized form will have only one "per"

per
<br/>:= "per"
* [ wfc: The token 'per' is the single value in \<unitIdComponent type="per"\> ]

<a name='product_unit' href='#product_unit'>product_unit</a>
<br/>:= single_unit ("-" single_unit)* ("-" pu_single_unit)*
<br/>   | pu_single_unit ("-" pu_single_unit)*
* [ wfc: No pu\_single\_unit may precede a single unit ]
* *Examples:*
* foot-pound-force

<a name='single_unit' href='#single_unit'>single_unit</a>
<br/>:= dimensionality_prefix? simple_unit
<br/>   | unit_constant
* *Examples:*
* square-kilometer
* 100

<a name='pu_single_unit' href='#pu_single_unit'>pu_single_unit</a>
<br/>:= "xxx-" single_unit
<br/>   | "x-" single_unit
* *Examples:*
* xxx-square-knuts (a Harry Potter unit)
* *Notes:*
* "x-" is only for backwards compatibility; it is deprecated and should not be generated
* See [Private-Use Units](https://github.com/unicode-org/cldr/edit/main/docs/ldml/tr35-general.md#Private_Use_Units)

<a name='unit_constant' href='#unit_constant'>unit_constant</a>
<br/>:= [1-9][0-9]* ("e" [1-9][0-9]*)?
* *Examples:*
* kilowatt-hour-per-100-kilometer
* gallon-per-100-mile
* per-200-pound
* per-12
* [ wfc: The numeric value of the unit constant must be an integer greater than one. ]
* *Notes:*
* The normal interpretation of `e` is used, where 2e6 \= 2×10⁶.
* The `e` notation is optional: per-100-kilometer and per-1e2-kilometer are equivalent unit\_identifiers.
* When constructing identifiers, exponents should be greater than 3 and multiples of 3, even though parsers must accept the wider range.

<a name='dimensionality_prefix' href='#dimensionality_prefix'>dimensionality_prefix</a>
<br/>:= "square-"
<br/>   | "cubic-"
<br/>   | "pow" ([2-9]|1[0-5]) "-"
* [ wfc: Must be value in: \<unitIdComponent type="power"\>]
* *Notes:*
* "pow2-" and "pow3-" canonicalize to "square-" and "cubic-"
<a name='simple_unit' href='#simple_unit'>simple_unit</a>
<br/>:= (prefix_component "-")* (prefixed_unit
<br/>   | base_component) ("-" suffix_component)*
<br/>   | currency_unit
<br/>   | ("em" | "g" | "us" | "hg" | "of")
* *Examples:*
* kilometer
* meter
* cup-metric
* fluid-ounce
* curr-chf
* em
* *Notes:*
* Five simple units are currently allowed as legacy usage, for tokens that wouldn’t otherwise be a base\_component due to length (eg, "g-force").Those are likely to be deprecated in teh future, with conformant aliases added: the "hg" and "of" are already only in deprecated simple\_units.
<a name='prefixed_unit' href='#prefixed_unit'>prefixed_unit</a>
prefix base_component
* *Examples:*
* kilometer

<a name='prefix' href='#prefix'>prefix</a>
<br/>:= si_prefix
<br/>   | binary_prefix

<a name='si_prefix' href='#si_prefix'>si_prefix</a>
<br/>:= "deka"
<br/>   | "hecto"
<br/>   | "kilo", …
* [ wfc: Must be an attribute value of the `type` in: \<unitPrefix type='…' … power10='…'\> ]
* *Notes:*
* See also [NIST special publication 811](https://www.nist.gov/pml/special-publication-811)

<a name='binary_prefix' href='#binary_prefix'>binary_prefix</a>
<br/>:= "kibi", "mebi", …
* [ wfc: Must be an attribute value of the `type` in: \<unitPrefix type='…' … power2='…'\>]
* *Notes:*
* See also [Prefixes for binary multiples](https://physics.nist.gov/cuu/Units/binary.html)

<a name='prefix_component' href='#prefix_component'>prefix_component</a>
<br/>:= [a-z]{3,∞}
* [ vc: must be value in: \<unitIdComponent type="prefix"\>]
* *Notes:*
* The set of prefix components often expands in new releases, so the requirement to be one of these attribute values is a validity constraint, not a well-formedness constraint. *

<a name='base_component' href='#base_component'>base_component</a>
<br/>:= [a-z]{3,∞}
* [ wfc: must not have a prefix as an initial segment. ]
* [ wfc: must not be a value in \<unitIdComponent type="X"\> for X in \{prefix, suffix, power, and, per} ]
* [ vc: Must be an attribute value of the `source` in: \<convertUnit source='…' …\> or the `type` in \<unitAlias type="…" replacement="…" …\> ]
* *Notes:*
* The set of base components typically expands in new releases, so the requirement to be one of these attribute values is a validity constraint, not a well-formedness constraint.
* The base-components in unitAlias `type` are deprecated, should be converted to their replacement values.
* No two different base\_components will share the first 8 letters; see [Unit Identifier Uniqueness](https://github.com/unicode-org/cldr/edit/main/docs/ldml/tr35-general.md#Unit_Identifier_Uniqueness).) ]

<a name='suffix_component' href='#suffix_component'>suffix_component</a>
<br/>:= [a-z]{3,∞}
* [ vc: must be value in: \<unitIdComponent type="suffix"\> ]
* *Notes:*
* The set of suffix components often expands in new releases, so the requirement to be one of these attribute values is a validity constraint, not a well-formedness constraint.

<a name='mixed_unit_identifier' href='#mixed_unit_identifier'>mixed_unit_identifier</a>
<br/>:= (single_unit | pu_single_unit) ("-" and "-" (single_unit | pu_single_unit ))*
* *Examples:*
* foot-and-inch

and
<br/>:= "and"
* [ wfc: The token 'and' is the single value in \<unitIdComponent type="and"\> ]

<a name='long_unit_identifier' href='#long_unit_identifier'>long_unit_identifier</a>
<br/>:= grouping "-" core_unit_identifier

grouping
<br/>:= [a-z]{3,∞}

<a name='currency_unit' href='#currency_unit'>currency_unit</a>
<br/>:= "curr-" [a-z]{3}
* [ wfc: The first part of the currency\_unit is a standard prefix; the second part of the currency unit must be a valid [Unicode currency identifier](https://github.com/unicode-org/cldr/blob/main/docs/ldml/tr35.md#UnicodeCurrencyIdentifier)]
* *Examples:*
* curr-eur-per-square-meter
* pound-per-curr-usd
* *Notes:*
* CLDR does not provide conversions for currencies; this is only intended for formatting.
* The locale data for currency display names is supplied in the `currencies` element, not in the `units` element.

Note that while the syntax allows for unit_constants in multiple places, the typical use case is only one instance, after a "-per-".
The normalized form of a unit identifier has at most one unit_constant in the numerator and one in the denominator.
Expand Down Expand Up @@ -3143,4 +3144,4 @@ The authors, contributors, and publishers have taken care in the preparation of
but make no express or implied representation or warranty of any kind and assume no responsibility or liability for errors or omissions or for consequential or incidental damages that may arise therefrom.
This publication is provided “AS-IS” without charge as a convenience to users.

Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the United States and other countries.
Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the United States and other countries.

0 comments on commit 8c4591e

Please sign in to comment.