Multiple acceptors and error models

Both old ideas and new development suggest a more flexible approach to accceptors and error models. Below is a list of things discussed in the past, + new ideas inspired by the ongoing machine learning work by @gusmakali, on word completion and prediction. Also some of the tasks mentioned in #19 are relevant to this.

# Multiple error models

- [ ] (neural model)
- [ ] hand-tuned error model
- [ ] #29
- [x] default/fall-back model (the present one)

The idea is that all of the above could be present in one and the same speller archive, and with some configuration specification as to when to apply which model. A very tentative idea could be that a machine learning error model will either get it right with the top hypothesis, or completely fail (as determined by filtering the hypothesis against the lexicon), thus use that one as a first step, then fall back to a hand-tuned error model, and when that fails (it could be written to be on the safe side, ie not suggest anything outside a certain set of errors), fall back to the default error model.

Exactly how this should work and interact is very much an open question, but divvunspell should provide the machinery so that linguists can experiment with it to reach an optimal setup for a given language and device type.

# Multiple acceptors

- [x] default acceptor (the present one)
- [ ] suggestion acceptor
- [ ] #29
- [ ] rejector

And possibly other variants too.

There are at least two ideas here:

- we might want to be more careful with what we suggest, and an easy way to do that is verifying suggestions against a more restricted acceptor, e.g. with no dynamic compounding or derivation (such words would still be accepted, just never suggested). Another way of restricting suggestions is to never suggest anything with a weight higher than a limit X, where X is configurable (this has been discussed several times in the past):
    - [ ] never suggest if weight higher than configurable weight X
- in productive word formation it is easy to overgenerate, e.g. for compounds, but subtracting illegal paths from an fst is hugely inefficient and space consuming. What is way better is to have a rejector fst that contains invalid strings, and anything in that fst should always be rejected, in all cases except when explicitly added to a user dictionary by the user.

As part of this work it is probably necessary to rework the zhfst archive format, probably by making the bhfst format the standard, including the json config file used there.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multiple acceptors and error models #25

Multiple error models

Multiple acceptors

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multiple acceptors and error models #25

Description

Multiple error models

Multiple acceptors

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions