feat(web): add classes for use in multi-token correction 🚂 by jahorton · Pull Request #15818 · keymanapp/keyman

jahorton · 2026-04-02T17:43:47Z

As mentioned in the description of #15814, up until now, we've always searched for corrections for just one token at a time - the current token. To better support whitespace fat-fingering, we'll need the ability to answer the following question: "Which tokenization pattern is the closest to matching the current text after corrections?" In essence, we need the ability to correct word boundaries, or perhaps to correct the engine's tokenization of the context.

One notable way forward is to consider what valid corrections we can find for each token in the tokenized context. This allows us to search for phrase-level corrections not focused on a specific word, prioritizing the most likely combination of tokenization-pattern and correction-cost for one of the pattern's tokens.

Now, as to the implementation of the new type...

Predictive text generally should not extend words not adjacent to the caret unnecessarily; predictive text exists to predict what is yet to be typed, rather than adjusting what was typed. Corrections validly may adjust what was typed, but should not aim to add to it. To use a simple example, for text such as the quicl brown fox, correcting quicl to quick may well be valid, but correcting the same word to quickly would not make sense - those two extra characters would appear out of nowhere.

So, when doing correction across a range of context, rather than a single token, we generally will want to correct all words not adjacent to the token, only allowing prediction for the adjacent token.

Build-bot: skip build:web
Test-bot: skip

keymanapp-test-bot · 2026-04-02T17:43:51Z

User Test Results

Test specification and instructions

User tests are not required

Test Artifacts

Web
- KeymanWeb Test Home - build : all tests passed (no artifacts on BuildLevel "build")

Build-bot: skip build:web Test-bot: skip

…ly for queue This, plus the upcoming QuotientNodeFinalizer class, will support result forwarding in multi-tokenization contexts during multi-token & multi-tokenization correction.

…table

…t the prediction-supporting one

ermshiperete

LGTM

ermshiperete · 2026-04-27T09:13:09Z

+  totalCost: number
+}
+
+export class TokenizationCorrector implements CorrectionSearchable<ReadonlyArray<TokenResult>, TokenizationResultMapping> {


Might be helpful to add a jsdoc comment to the class even though you already have a comment at the top. That way it would show up in vscode and similar tools.

ermshiperete · 2026-04-27T09:15:16Z

+// - a result for correcting a single Token - "TokenResult"?
+// - a result for completing correction for a full Tokenization - "TokenizationResult"?
+
+export type TokenResult = {


I'm surprised that this type is new - I thought I had come across it in previous PRs.

ermshiperete · 2026-04-27T09:16:09Z

+  public readonly tokenization: ContextTokenization;
+  private readonly tailCorrectionLength: number
+
+  private readonly _uncorrectables: SearchQuotientNode[];


Why do we have some private fields beginning with underscore and others without?

ermshiperete · 2026-04-27T09:16:32Z

+
+export class TokenizationCorrector implements CorrectionSearchable<ReadonlyArray<TokenResult>, TokenizationResultMapping> {
+  public readonly tokenization: ContextTokenization;
+  private readonly tailCorrectionLength: number


nit:

Suggested change

private readonly tailCorrectionLength: number

private readonly tailCorrectionLength: number;

ermshiperete · 2026-04-27T09:21:33Z

+    this.tailCorrectionLength = tailCorrectionLength;
+
+    if(tailCorrectionLength < 1) {
+      throw new Error(`Length for correction near tail may not be 0.`);


This will also throw for negative numbers which isn't reflected in the error message.

I'm wondering if it would be helpful to output the actual value? (similar two lines below)

ermshiperete · 2026-04-27T09:26:13Z

+      if(!filterClosure(token)) {
+        this._uncorrectables.push(searchModule);
+      } else if(index == tailCorrectionLength - 1) {
+        this._predictable = searchModule;


Do we have to check that this._predictable does not yet have a value?

ermshiperete · 2026-04-27T09:31:36Z

+    // Compute a weighting for each token's search space based the increase in
+    // tokenization cost that it represents.


Is there a word missing?

Suggested change

// Compute a weighting for each token's search space based the increase in

// tokenization cost that it represents.

// Compute a weighting for each token's search space based on the increase in

// tokenization cost that it represents.

ermshiperete · 2026-04-27T09:39:28Z

+      }
+    });
+
+    this._lockedTokenResults = new Map();


I don't understand the name _lockedTokenResults. Why are the TokenResults locked? What does it mean?

Also the name is a bit misleading - sounds like they are the results of locked tokens, i.e. locked tokens that this class will return somewhere. May be better to rename to _lockedTokenResultsMap or something?

ermshiperete · 2026-04-27T09:48:31Z

+      this._lockedTokenResults.set(correctableToUpdate, tokenResult.mapping);
+    }
+
+    const resultCost = tokenResult.type != 'none' ? tokenResult.cost : this._lockedTokenResults.get(correctableToUpdate).totalCost


nit:

Suggested change

const resultCost = tokenResult.type != 'none' ? tokenResult.cost : this._lockedTokenResults.get(correctableToUpdate).totalCost

const resultCost = tokenResult.type != 'none' ? tokenResult.cost : this._lockedTokenResults.get(correctableToUpdate).totalCost;

github-project-automation Bot added this to Keyman Apr 2, 2026

github-project-automation Bot moved this to Todo in Keyman Apr 2, 2026

keymanapp-test-bot Bot added the epic-autocorrect label Apr 2, 2026

github-actions Bot added web/ web/predictive-text/ feat labels Apr 2, 2026

keymanapp-test-bot Bot changed the title ~~feat(web): add classes for use in multi-token correction~~ feat(web): add classes for use in multi-token correction 🚂 Apr 2, 2026

keymanapp-test-bot Bot added this to the A19S26 milestone Apr 2, 2026

jahorton mentioned this pull request Apr 9, 2026

feat(web): implement multi-token suggestion-application range calculation 🚂 #15782

Open

jahorton force-pushed the feat/web/correction-search-abstraction branch from 6cedadc to eaf419f Compare April 9, 2026 21:39

This was referenced Apr 9, 2026

feat(web): add abstraction for main correction-search algorithm, leveraged types 🚂 #15817

Merged

change(web): simplify model.predict() calls 🚂 #15851

Draft

jahorton changed the base branch from feat/web/correction-search-abstraction to change/web/suggestion-range April 13, 2026 16:19

jahorton force-pushed the feat/web/multi-token-corrector branch from 34e48e8 to b9480ec Compare April 13, 2026 16:37

keyman-server modified the milestones: A19S26, A19S27 Apr 14, 2026

jahorton force-pushed the feat/web/multi-token-corrector branch 3 times, most recently from 2ae39dd to cd19326 Compare April 16, 2026 15:33

jahorton force-pushed the change/web/suggestion-range branch from 1baecad to be45f31 Compare April 22, 2026 16:08

jahorton added 5 commits April 22, 2026 11:25

feat(web): add classes for use in multi-token correction

595bb2f

Build-bot: skip build:web Test-bot: skip

feat(web): add TokenizationCorrector unit tests

fee2ece

docs(web): add tokenization-corrector.ts header

f99ad23

change(web): TokenizationCorrector should use quotient-nodes internal…

9fc7c23

…ly for queue This, plus the upcoming QuotientNodeFinalizer class, will support result forwarding in multi-tokenization contexts during multi-token & multi-tokenization correction.

fix(web): add proper TokenizationResultMapping test export

7e594e0

jahorton force-pushed the feat/web/multi-token-corrector branch from 4d05f5e to 7e594e0 Compare April 22, 2026 16:27

jahorton added 3 commits April 22, 2026 12:39

fix(web): add upper bound for token correction length

af3f2ee

fix(web): clean up TokenizationCorrector.handleNextNode

70f04e6

change(web): remove rebase commented-line artifact

7c95d08

jahorton mentioned this pull request Apr 22, 2026

feat(web): adds finalizer quotient-node type for facilitating multi-word corrections 🚂 #15855

Open

change(web): support baseline non-correction if token proves uncorrec…

e81ae0a

…table

jahorton force-pushed the feat/web/multi-token-corrector branch from 486cbd7 to e81ae0a Compare April 22, 2026 20:02

jahorton added 2 commits April 22, 2026 15:15

fix(web): remove any exhausted token from further correction, not jus…

3e5e78a

…t the prediction-supporting one

feat(web): TokenizationCorrector should track returned results

b92642e

jahorton requested a review from ermshiperete April 24, 2026 14:29

jahorton marked this pull request as ready for review April 24, 2026 14:29

keyman-server modified the milestones: A19S27, A19S28 Apr 24, 2026

ermshiperete approved these changes Apr 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(web): add classes for use in multi-token correction 🚂#15818

feat(web): add classes for use in multi-token correction 🚂#15818
jahorton wants to merge 11 commits intochange/web/suggestion-rangefrom
feat/web/multi-token-corrector

jahorton commented Apr 2, 2026 •

edited

Loading

Uh oh!

keymanapp-test-bot Bot commented Apr 2, 2026 •

edited

Loading

Uh oh!

ermshiperete left a comment

Uh oh!

ermshiperete Apr 27, 2026

Uh oh!

ermshiperete Apr 27, 2026

Uh oh!

ermshiperete Apr 27, 2026

Uh oh!

ermshiperete Apr 27, 2026

Uh oh!

ermshiperete Apr 27, 2026

Uh oh!

ermshiperete Apr 27, 2026

Uh oh!

ermshiperete Apr 27, 2026

Uh oh!

ermshiperete Apr 27, 2026

Uh oh!

ermshiperete Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	private readonly tailCorrectionLength: number
	private readonly tailCorrectionLength: number;

		// Compute a weighting for each token's search space based the increase in
		// tokenization cost that it represents.

	const resultCost = tokenResult.type != 'none' ? tokenResult.cost : this._lockedTokenResults.get(correctableToUpdate).totalCost
	const resultCost = tokenResult.type != 'none' ? tokenResult.cost : this._lockedTokenResults.get(correctableToUpdate).totalCost;

Uh oh!

Conversation

jahorton commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keymanapp-test-bot Bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User Test Results

Test Artifacts

Uh oh!

ermshiperete left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jahorton commented Apr 2, 2026 •

edited

Loading

keymanapp-test-bot Bot commented Apr 2, 2026 •

edited

Loading