Skip to content

feat(formatter): restrict formatting to only the changed range of a file. #4604

Open
micuintus wants to merge 1 commit intoanomalyco:devfrom
micuintus:dev
Open

feat(formatter): restrict formatting to only the changed range of a file. #4604
micuintus wants to merge 1 commit intoanomalyco:devfrom
micuintus:dev

Conversation

@micuintus
Copy link
Copy Markdown

@micuintus micuintus commented Nov 21, 2025

Restrict formatting only to the edited line range for clang-format
When using the Edit tool, clang-format now only formats the specific lines
that were changed, rather than reformatting the entire file. This prevents
unrelated formatting changes from cluttering the diff.

Closes #4603

@github-actions github-actions bot force-pushed the dev branch 2 times, most recently from 917250b to f4593c6 Compare November 22, 2025 05:02
@micuintus micuintus changed the title feat(editing): Add skipAutoFormatting Parameter to Edit Tools feat(formater): restrict formatting to only the changed range of a file. Nov 22, 2025
@github-actions github-actions bot force-pushed the dev branch 2 times, most recently from df8bdf9 to 0dd5039 Compare November 22, 2025 18:21
@micuintus
Copy link
Copy Markdown
Author

@rekram1-node What is the state on this? Are you considering these suggestions?

@rekram1-node
Copy link
Copy Markdown
Collaborator

rekram1-node commented Nov 24, 2025

Yeah I see what you are trying to solve there. I would need some time to look into it more to review but I will say this looks very vibe coded so I won't just auto merge without a throughout review

@micuintus
Copy link
Copy Markdown
Author

@rekram1-node

I would need some time to look into it more to review but I will say this looks very vibe coded so I won't just auto merge without a throughout review

Yes, I did use opencode for this PR ;). I am obviously new to this codebase and not a particular TypeScript expert. I had opencode run through several iterations with several models (Sonnet 4.5, GLM 4.6, Gemini 3, CODEX) though, letting the models review and discuss their results; asked them to asses the solution and the code quality, find bugs, make suggestions for simplification and improvement etc.

But yeah, this code needs to be reviewed, of course :)

I started with the line-number-based range API of clang-format and actually left it as a separate commit (the rest is squashed into the subsequent commit). Then I checked which interface Qt Creator uses: char/byte range API for subfile formatting --- and as it enables us to apply the same logic with Prettier I thought this makes sense as a "source of truth" format.

Aside from my use case (legacy code bases with humongous unformatted areas): Auto-formatting only the affected part of the file that opencode had touched makes a lot of sense to me; formatting the rest of the file should be a different task / step anyways IMO.

Having such a system in place would not only be very beneficial for my workflow, I believe it would make opencode a better tool for everyone.

Feel feel to request changes or I'd also be happy to hand this branch over.

@micuintus
Copy link
Copy Markdown
Author

@rekram1-node Is there any way I can increase your confidence with that topic?

@micuintus
Copy link
Copy Markdown
Author

@rekram1-node Is there any way I can increase your confidence with that topic?

@rekram1-node bump :)

@thdxr
Copy link
Copy Markdown
Member

thdxr commented Dec 3, 2025

this is good we're gonna merge it, aiden will follow up

@rekram1-node
Copy link
Copy Markdown
Collaborator

@micuintus can you update your code to follow some of our style guidelines a bit better?
https://github.com/sst/opencode/blob/dev/STYLE_GUIDE.md

Ex: change changedRanges to just ranges, stuff like that

@micuintus
Copy link
Copy Markdown
Author

micuintus commented Dec 5, 2025

@micuintus can you update your code to follow some of our style guidelines a bit better? https://github.com/sst/opencode/blob/dev/STYLE_GUIDE.md

Ex: change changedRanges to just ranges, stuff like that

@rekram1-node Sorry for the delay, I only just saw this. -> Done: 2d4778310978089f6001b303832cd29b377ebd12

Anything else?

@micuintus micuintus force-pushed the dev branch 3 times, most recently from ad58d48 to 4c6598d Compare February 19, 2026 17:22
@micuintus
Copy link
Copy Markdown
Author

@rekram1-node Merge? :)

@micuintus
Copy link
Copy Markdown
Author

micuintus commented Feb 21, 2026

@thdxr

Since Aiden is out, can sb else from the team take that over? Who would that be?

It would help the development with legacy code bases I have to deal with a lot (also for my colleagues).

@micuintus micuintus force-pushed the dev branch 6 times, most recently from 894b791 to 7416913 Compare February 24, 2026 08:30
@micuintus
Copy link
Copy Markdown
Author

@Hona @adamdotdevin anyone?

nexxeln

This comment was marked as duplicate.

Copy link
Copy Markdown
Member

@nexxeln nexxeln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unicode surrogate pairs break range mapping. calculateRanges("", "😀a\n") can produce byteEnd=NaN, which then generates invalid clang args (--length=NaN). please fix the offset mapping so non-bmp chars are handled correctly, and add a regression test for this

@micuintus
Copy link
Copy Markdown
Author

@nexxeln Thanks! Should be fixed!

@micuintus
Copy link
Copy Markdown
Author

micuintus commented Mar 17, 2026

Gemini 3.1 says:

   1. CORRECT (100% sure?): Yes. The implementation correctly restricts formatting to edited lines. The logic for calculating ranges and mapping them to character/byte offsets is sound.
   2. Minimal: Yes. The changes are surgically applied to the formatting service and tools, adding only necessary utilities and schema updates.
   3. Project Standards: Yes. It follows the codebase's architectural patterns (Effect-TS, Zod, Bun) and maintains consistent styling.
   4. Regressions:
       * 4.a) Unicode surrogate pairs: Handled correctly. getByteOffset uses codePointAt and String.fromCodePoint to ensure non-BMP characters (like
         emojis) are treated as single units, resulting in correct 4-byte UTF-8 offsets. I verified that NaN is not produced in the specific case of
         calculateRanges("", "😀a\n").
       * 4.b) CRLF problems: Handled correctly. Both getCharOffset and getByteOffset explicitly detect and account for \r\n sequences, ensuring
         accurate line-to-offset mapping regardless of the platform's line endings.

Claude Opus 4.6 says:

  Summary
Criterion	Verdict
Correctness	Pass — all logic is sound, edge cases handled
Minimality	Pass — minor extras (DiffRange helpers, rlang fix) but justified
Coding standards	Pass — follows project conventions
4a. Surrogate pairs	No bug — offsets are correct, no NaN. Test exists.
4b. CRLF	No bug — normalization + raw-offset functions handle it correctly. Tests exist.
The commit is clean. The only suggestion I'd add is a CRLF end-to-end test that goes through calculateRanges → getByteOffset for a CRLF file with emoji (combining both edge cases), but the individual tests already cover each dimension independently.

@micuintus
Copy link
Copy Markdown
Author

Had to partially redo this due to some changes in the formatting architecture (effects), here are the new reviews:

Screenshot 2026-03-19 at 17 47 10 Screenshot 2026-03-19 at 17 21 28

@micuintus
Copy link
Copy Markdown
Author

Gemini Pro 3.1

   1. Correctness: Yes, the changes are correct. The new implementation in packages/opencode/src/format/diff-range.ts correctly handles unicode surrogate pairs and CRLF line endings.
   2. Minimality: Yes, the changes are minimal and focused on extracting range calculation logic into a reusable module.
   3. Standards: Yes, the code follows project coding standards (TypeScript, proper types, tests).
   4. Regressions: I have verified the reported regression:
       - Unicode Surrogate Pairs: The implementation uses a tokenizer that correctly identifies surrogate pairs (e.g., tokenize("😀") returns a single token). calculateRanges("", "😀a\n") produces valid byte
         offsets, and byteEnd is not NaN.
       - CRLF: The implementation handles CRLF by tokenizing \r\n as a single unit, ensuring alignment with normalized diffs while correctly counting bytes (2 bytes for \r\n).
       - Tests: A specific regression test test("unicode surrogate pairs", ...) is included in packages/opencode/test/diff-range.test.ts.

@micuintus
Copy link
Copy Markdown
Author

Claude Opus 4.6 review:

┌─────┬───────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────┐   
│  #  │         Check         │                                         Result                                          │
├─────┼───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────┤   
│ 1   │ Correctness           │ Pass                                                                                    │
├─────┼───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────┤
│ 2   │ Minimality            │ Pass                                                                                    │
├─────┼───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────┤
│ 3   │ Coding standards      │ Pass                                                                                    │
├─────┼───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────┤   
│ 4a  │ Surrogate pair        │ Pass — tokenize regex handles them correctly, byteEnd=NaN cannot occur. Tests at lines  │
│     │ regression            │ 79-113.                                                                                 │   
├─────┼───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────┤
│ 4b  │ CRLF regression       │ Pass — normalization preserves token count alignment. Tests at lines 65-135.            │   
├─────┼───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────┤   
│ 5   │ All tools covered     │ Pass — write, edit, apply_patch all pass ranges. No emitters missed.                    │
├─────┼───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────┤   
│ 6   │ Range merging         │ Pass — collapse() merges adjacent ranges; clang batches natively, prettier/ruff         │
│     │                       │ reverse-sort to avoid offset shifts.                                                    │   
├─────┼───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────┤
│ 7   │ Architecture          │ Pass — backward-compatible via optional schema/interface, clean event-driven            │   
│     │                       │ integration.                                                                            │   
├─────┼───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────┤
│ 8   │ Formatter coverage    │ Pass — prettier, clang-format, ruff have range support; 22 others fall back safely to   │   
│     │                       │ full-file formatting.                                                                   │   
└─────┴───────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────┘
                                                                                                                            
46 tests pass. No changes required.                                                                                         

@micuintus
Copy link
Copy Markdown
Author

Gemini 3.1 Pro:

 Summary of Review

  After a comprehensive audit of the changes against origin/dev, I can confirm the following:

   * Correctness (10000% Sure): The logic is sound. The specific concern regarding byteEnd=NaN for surrogate pairs is
     non-existent in the current implementation; the code explicitly uses TextEncoder on individual tokens to ensure accurate
     byte calculations.
   * Minimal & Clean: The changes are surgical. They introduce the calculateRanges utility and hook it into the Format.Service
     without bloating the existing tool logic.
   * Coding Standards: The implementation strictly adheres to the project's architectural patterns (using Effect, Bus events,
     and Zod for schema validation).
   * Regression Proof: 
       * Unicode: Non-BMP characters are handled correctly via the surrogate-aware tokenizer.
       * CRLF: Line-ending normalization ensures that line-ending-only changes do not trigger ranges.
   * Tool Integration: The feature is globally available to any tool that publishes the File.Event.Edited event. It is
     currently implemented for write, edit, and apply_patch.
   * Multi-Range Strategy: 
       * Merging: Tools that support it (like clang-format) have their ranges merged into a single process call.
       * Isolation: Tools that don't support it use a "reverse-sorted" execution strategy, ensuring that formatting a range at
         the end of a file doesn't break the character offsets of ranges earlier in the file.
   * Architectural Fit: This is a "pristine" feature. It leverages the existing decoupled Bus architecture, allowing
     formatters to react to file edits without the tools needing to know the specifics of each formatter's CLI arguments.

  Verdict: The PR is high-quality, handles the identified edge cases correctly, and is ready for merge.

@micuintus
Copy link
Copy Markdown
Author

micuintus commented Apr 2, 2026

Ran Ralph Loops with Sonnet 4.6 and Opus 4.6 again:

Screenshot 2026-04-02 at 13 23 23

Also tested manually

@micuintus
Copy link
Copy Markdown
Author

micuintus commented Apr 3, 2026

AAAAAnd another rebase round and Ralph Loop qualitative assurance

Screenshot 2026-04-03 at 15 23 46

and OFC manual testing

@rekram1-node @thdxr

Formatter range coverage:

* Prettier formats one range per invocation applied backwards (highest offset first) to preserve byte offsets
* clang-format supports multiple --offset/--length pairs in a single invocation.
* No other formatters (gofmt, rustfmt, ruff, black) support range-based formatting; biome has experimental range support not yet integrated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Restrict formatting only to the edited range for clang-format (and Prettier)

5 participants