Skip to content

feat(core): implement multi-strategy replacement pipeline for edit tool#2405

Closed
undici77 wants to merge 1 commit intoQwenLM:mainfrom
undici77:edit-enhancement
Closed

feat(core): implement multi-strategy replacement pipeline for edit tool#2405
undici77 wants to merge 1 commit intoQwenLM:mainfrom
undici77:edit-enhancement

Conversation

@undici77
Copy link
Copy Markdown

  • Add exact, flexible, regex, and fuzzy matching strategies
  • Fuzzy matching uses Levenshtein distance with whitespace penalty (up to 10% threshold)
  • Support ai_proposed_content and modified_by_user flags for user-modified edits
  • Handle edge cases: null content from readTextFile, empty old_string for file creation
  • Add comprehensive test matrix for replace_all scenarios (0, 1, multiple occurrences)
  • Improve error handling with READ_CONTENT_FAILURE and FILE_WRITE_FAILURE types

This enhances the edit tool's robustness when matching and replacing text, especially for cases with whitespace differences or minor content variations.

TLDR

Using qwen-code with smaller models like Qwen3-Coder-30B, the edit tool fails frequently with EDIT_NO_OCCURRENCE_FOUND even when the intent is clear — the model produces old_string with slightly wrong indentation, normalised whitespace, or a minor typo and the single exact-match strategy gives up immediately. gemini-edit.ts in the same codebase already solves this with a four-strategy cascade. This PR ports that cascade into edit.ts, fixes two broken tests exposed by the change, and tightens the tool prompt to remove redundancy and weak language.

Dive Deeper

1. Multi-strategy replacement pipeline

The current edit.ts calls countOccurrences + safeLiteralReplace — one shot, exact match only. Common failure patterns with smaller models:

  • old_string copied with one extra leading space or wrong indentation → no match
  • Tab normalised to spaces → no match
  • Trailing comma omitted in a multi-line object → no match
  • Single quotes where the file has double quotes → no match

Each failure costs a full round-trip: error → re-read file → retry. Ported from gemini-edit.ts:

Priority Strategy Triggers when
1 exact verbatim match after CRLF normalisation
2 flexible lines match after stripping per-line whitespace; indentation rebased from first matched line
3 regex tokens match with \s* between them; handles intra-line spacing variation
4 fuzzy weighted Levenshtein distance ≤ 10% of search length

Adaptations to edit.ts conventions: replace_all (not allow_multiple), BOM/encoding preservation (not CRLF tracking), StandardFileSystemService read/write API. Fuzzy matches append "Applied fuzzy match at line N." to llmContent so the model knows a looser match was used.

2. Test fixes and new coverage (56 → 93 tests)

Two existing tests were broken or fragile:

  • FILE_WRITE_FAILURE used chmod 444 — silently passes when the runner is root (common in CI). Replaced with vi.spyOn(service, 'writeTextFile').mockRejectedValueOnce(...).
  • Double-space mismatch test expected EDIT_NO_OCCURRENCE_FOUND — the flexible strategy now succeeds, assertion updated.

Two infrastructure fixes needed for the new pipeline:

  • logEditStrategy added to the loggers.js mock (was missing, would throw on any strategy call).
  • getFileSystemService changed from a plain arrow function to vi.fn() to allow per-test service overriding.

New suites added: calculateReplacement (27 tests, all 4 strategies), replace_all parametrised matrix (7 cases), getModifyContext (all 4 methods), toolLocations, shouldConfirmExecute (ProceedAlways).

3. Tool prompt tightening (~210 → ~95 tokens)

The old_string uniqueness rule was stated in three separate places. "preferably unescaped" implies escaping is sometimes acceptable — it never is. replace_all description only explained true, never what false enforces. Collapsed into four numbered rules with imperative language and a single statement of consequences.

- Add exact, flexible, regex, and fuzzy matching strategies
- Fuzzy matching uses Levenshtein distance with whitespace penalty (up to 10% threshold)
- Support ai_proposed_content and modified_by_user flags for user-modified edits
- Handle edge cases: null content from readTextFile, empty old_string for file creation
- Add comprehensive test matrix for replace_all scenarios (0, 1, multiple occurrences)
- Improve error handling with READ_CONTENT_FAILURE and FILE_WRITE_FAILURE types

This enhances the edit tool's robustness when matching and replacing text,
especially for cases with whitespace differences or minor content variations.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
@tanzhenxin
Copy link
Copy Markdown
Collaborator

@undici77 Thanks for your contribution!

As as to the fuzzy match feature for edit tool, we had a similar implementation before, but it caused some trouble so we rolled if back, though I cannot recall the details now.

For our vast majority installation base, they use top-tier models like qwen3.5-plus, which does not have much trouble with indentation issues lately. (If you encounter problem with mixed CJK characters and english characters, that is a known issue, checkout this PR for more detail. #2300 )

So we decide now to merge this PR which targets small models on local inference endpoint.

Thanks again for your understanding!

@tanzhenxin tanzhenxin self-assigned this Mar 18, 2026
@tanzhenxin tanzhenxin added status/blocked Blocked by external dependency status/on-hold Temporarily paused and removed status/blocked Blocked by external dependency status/on-hold Temporarily paused labels Mar 18, 2026
@undici77 undici77 closed this Mar 27, 2026
@undici77 undici77 deleted the edit-enhancement branch March 27, 2026 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status/blocked Blocked by external dependency

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants