Skip to content

feat(web): add classes for use in multi-token correction 🚂#15818

Open
jahorton wants to merge 11 commits intochange/web/suggestion-rangefrom
feat/web/multi-token-corrector
Open

feat(web): add classes for use in multi-token correction 🚂#15818
jahorton wants to merge 11 commits intochange/web/suggestion-rangefrom
feat/web/multi-token-corrector

Conversation

@jahorton
Copy link
Copy Markdown
Contributor

@jahorton jahorton commented Apr 2, 2026

As mentioned in the description of #15814, up until now, we've always searched for corrections for just one token at a time - the current token. To better support whitespace fat-fingering, we'll need the ability to answer the following question: "Which tokenization pattern is the closest to matching the current text after corrections?" In essence, we need the ability to correct word boundaries, or perhaps to correct the engine's tokenization of the context.

One notable way forward is to consider what valid corrections we can find for each token in the tokenized context. This allows us to search for phrase-level corrections not focused on a specific word, prioritizing the most likely combination of tokenization-pattern and correction-cost for one of the pattern's tokens.


Now, as to the implementation of the new type...

Predictive text generally should not extend words not adjacent to the caret unnecessarily; predictive text exists to predict what is yet to be typed, rather than adjusting what was typed. Corrections validly may adjust what was typed, but should not aim to add to it. To use a simple example, for text such as the quicl brown fox, correcting quicl to quick may well be valid, but correcting the same word to quickly would not make sense - those two extra characters would appear out of nowhere.

So, when doing correction across a range of context, rather than a single token, we generally will want to correct all words not adjacent to the token, only allowing prediction for the adjacent token.

Build-bot: skip build:web
Test-bot: skip

@keymanapp-test-bot
Copy link
Copy Markdown

keymanapp-test-bot Bot commented Apr 2, 2026

User Test Results

Test specification and instructions

User tests are not required

Test Artifacts

  • Web
    • KeymanWeb Test Home - build : all tests passed (no artifacts on BuildLevel "build")

@keymanapp-test-bot keymanapp-test-bot Bot changed the title feat(web): add classes for use in multi-token correction feat(web): add classes for use in multi-token correction 🚂 Apr 2, 2026
@keymanapp-test-bot keymanapp-test-bot Bot added this to the A19S26 milestone Apr 2, 2026
@jahorton jahorton force-pushed the feat/web/correction-search-abstraction branch from 6cedadc to eaf419f Compare April 9, 2026 21:39
@jahorton jahorton changed the base branch from feat/web/correction-search-abstraction to change/web/suggestion-range April 13, 2026 16:19
@jahorton jahorton force-pushed the feat/web/multi-token-corrector branch from 34e48e8 to b9480ec Compare April 13, 2026 16:37
@keyman-server keyman-server modified the milestones: A19S26, A19S27 Apr 14, 2026
@jahorton jahorton force-pushed the feat/web/multi-token-corrector branch 3 times, most recently from 2ae39dd to cd19326 Compare April 16, 2026 15:33
@jahorton jahorton force-pushed the change/web/suggestion-range branch from 1baecad to be45f31 Compare April 22, 2026 16:08
Build-bot: skip build:web
Test-bot: skip
…ly for queue

This, plus the upcoming QuotientNodeFinalizer class, will support result forwarding in multi-tokenization contexts during multi-token & multi-tokenization correction.
@jahorton jahorton force-pushed the feat/web/multi-token-corrector branch from 4d05f5e to 7e594e0 Compare April 22, 2026 16:27
@jahorton jahorton force-pushed the feat/web/multi-token-corrector branch from 486cbd7 to e81ae0a Compare April 22, 2026 20:02
@jahorton jahorton requested a review from ermshiperete April 24, 2026 14:29
@jahorton jahorton marked this pull request as ready for review April 24, 2026 14:29
@keyman-server keyman-server modified the milestones: A19S27, A19S28 Apr 24, 2026
Copy link
Copy Markdown
Contributor

@ermshiperete ermshiperete left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

totalCost: number
}

export class TokenizationCorrector implements CorrectionSearchable<ReadonlyArray<TokenResult>, TokenizationResultMapping> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be helpful to add a jsdoc comment to the class even though you already have a comment at the top. That way it would show up in vscode and similar tools.

// - a result for correcting a single Token - "TokenResult"?
// - a result for completing correction for a full Tokenization - "TokenizationResult"?

export type TokenResult = {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised that this type is new - I thought I had come across it in previous PRs.

public readonly tokenization: ContextTokenization;
private readonly tailCorrectionLength: number

private readonly _uncorrectables: SearchQuotientNode[];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have some private fields beginning with underscore and others without?


export class TokenizationCorrector implements CorrectionSearchable<ReadonlyArray<TokenResult>, TokenizationResultMapping> {
public readonly tokenization: ContextTokenization;
private readonly tailCorrectionLength: number
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
private readonly tailCorrectionLength: number
private readonly tailCorrectionLength: number;

this.tailCorrectionLength = tailCorrectionLength;

if(tailCorrectionLength < 1) {
throw new Error(`Length for correction near tail may not be 0.`);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This will also throw for negative numbers which isn't reflected in the error message.
  • I'm wondering if it would be helpful to output the actual value? (similar two lines below)

if(!filterClosure(token)) {
this._uncorrectables.push(searchModule);
} else if(index == tailCorrectionLength - 1) {
this._predictable = searchModule;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to check that this._predictable does not yet have a value?

Comment on lines +144 to +145
// Compute a weighting for each token's search space based the increase in
// tokenization cost that it represents.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a word missing?

Suggested change
// Compute a weighting for each token's search space based the increase in
// tokenization cost that it represents.
// Compute a weighting for each token's search space based on the increase in
// tokenization cost that it represents.

}
});

this._lockedTokenResults = new Map();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the name _lockedTokenResults. Why are the TokenResults locked? What does it mean?

Also the name is a bit misleading - sounds like they are the results of locked tokens, i.e. locked tokens that this class will return somewhere. May be better to rename to _lockedTokenResultsMap or something?

this._lockedTokenResults.set(correctableToUpdate, tokenResult.mapping);
}

const resultCost = tokenResult.type != 'none' ? tokenResult.cost : this._lockedTokenResults.get(correctableToUpdate).totalCost
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
const resultCost = tokenResult.type != 'none' ? tokenResult.cost : this._lockedTokenResults.get(correctableToUpdate).totalCost
const resultCost = tokenResult.type != 'none' ? tokenResult.cost : this._lockedTokenResults.get(correctableToUpdate).totalCost;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

3 participants