Normalizing speech data

Implementation experience in [the ARIA-AT project](https://aria-at.w3.org) has uncovered a large amount of variation in the formatting of the text that screen readers send to text-to-speech engines.

For instance, we've observed text from JAWS such as:

    Print Page  \u001d Button \u001e

(Where `\u001d` and `\u001e` represent the Unicode "group separator" and "record separator", respectively)

And text from VoiceOver like:

      Print Page
                  button
      You are currently on a button. To click this button, press 
    Control            -Option            -Space.

(Note the copious amount of empty space, including a trailing space on the third line)

To be sure, these examples are not only accurate but also compliant. The specification places no constraints on the way the text is formatted. The relevant language reads:

> When the assistive technology would send some text `data` (a string, without speech-specific markup or annotations) to the Text-To-Speech system, or equivalent for non-speech assistive technology software, run these steps:

However, those examples are not the most intuitive way to express the spoken text. The formatting is important to ARIA-AT, so we've written some logic to normalize at the application level. Since I expect formatting will also be important to many future consumers of the protocol, this seems like an opportunity for the standard to reduce repeated work.

A number of concerns come to mind:

- removing details which have no impact on the vocalized text (e.g. extraneous space, new lines, some punctuation, some capitalization)
- using a data type other than a simple string (e.g. an array of strings, each describing a discrete utterance)
- expressing this in a localizable way (at first blush, [Unicode's offerings](https://unicode.org/reports/tr29/) seem promising)

Should we expect implementations to eventually improve and "do the right thing" in these regards? Or should we constrain speech data in some way? If so, how?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalizing speech data #92

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Normalizing speech data #92

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions