Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

epic: dictionary-based word-breakers 🔬 #12142

Draft
wants to merge 40 commits into
base: master
Choose a base branch
from
Draft

Conversation

mcdurdin
Copy link
Member

@mcdurdin mcdurdin commented Aug 9, 2024

No description provided.

jahorton and others added 21 commits August 9, 2024 09:40
Only wordbreaks anything AFTER the last space / ZWNJ.  Doesn't bother with anything before it.
…ordbreakers/dict-breaker-start' into change/common/models/wordbreakers/unit-test-trie-access
…/models/wordbreakers/fuse-dict-unmatched-chars
…ordbreakers/dict-breaker-start' into change/common/models/wordbreakers/unit-test-trie-access
…common/models/wordbreakers/fuse-dict-unmatched-chars
@keymanapp-test-bot keymanapp-test-bot bot added the user-test-missing User tests have not yet been defined for the PR label Aug 9, 2024
@keymanapp-test-bot
Copy link

keymanapp-test-bot bot commented Aug 9, 2024

User Test Results

Test specification and instructions

ERROR: user tests have not yet been defined

Test Artifacts

@jahorton
Copy link
Contributor

I got to wondering if there are any "relatively simple" ways to avoid spinning up a WebView to run the model-compiler, should we decide to keep the user-dictionary compilation completely separate from the keyboard.

After a bit of searching, I found this: https://github.com/nodejs-mobile - a library for running Node-oriented JS scripts for mobile devices. That said, it'd be a new dependency.

@jahorton
Copy link
Contributor

Other notable thoughts:

We should probably not associate a language code with user-dictionary data. That is, we collate the data once and use that with any language supporting predictive-text.

My original strategy (as of #11994) was to blend the models into a single, "traversable" model.

  • This would require that the standard lexical model for each language implements the LexiconTraversal interface, though - which is not strictly required for all custom models.
    • We'd need an alternate strategy to support scenarios where a language-specific custom model lacks this feature.
  • Thinking ahead, we'd want a similar strategy to be in place once we start doing 'learning', which would adjust a model's probability data to better suit the user's actual typing patterns.

@mcdurdin previously suggested instead doing multiple correction-searches and picking the best from among their results after applying relative weighting. This would work, though it would also require support for multiple correction searches that does not yet exist.

  • They should likely use the same allotment for total execution time... likely requiring some form of load balancing.
  • We'd likely need the ability to pause and resume whichever search is currently returning 'more likely' paths at the time.

@mcdurdin
Copy link
Member Author

  • we do have to solve the issue of data transfer in one manner or other.

Data transfer into the webview could be via local file: or http: request. This opens up a number of extensibility questions for KeymanWeb itself and how we could make the web-based experience consistent with the Keyman Android/iOS app experience.

@mcdurdin mcdurdin modified the milestones: A18S9, A18S19 Aug 27, 2024
jahorton and others added 9 commits August 27, 2024 10:15
…ers/dict-breaker-start

feat(common/models/wordbreakers): begin development of dictionary-based wordbreaking algorithm 🔬
…akers/unit-test-trie-access

change(common/models/wordbreakers): allow wordbreaker tests to access TrieModel implementation 🔬
…ers/fuse-dict-unmatched-chars

feat(common/models/wordbreakers): fuse adjacent unmatched characters when dictionary-breaking 🔬
…-breaker

chore: merge master into dict-breaker 🔬
…-breaker

chore: merge master into dict-breaker 🔬
@github-actions github-actions bot added common/ and removed common/ labels Oct 11, 2024
@github-actions github-actions bot added common/ and removed common/ labels Oct 25, 2024
@github-actions github-actions bot added common/ and removed common/ labels Nov 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

3 participants