Skip to content

fix(attachment): hash full file content instead of range content and consolidate FileInfo#2578

Merged
tusharmath merged 2 commits intomainfrom
attachement-hash-bug
Mar 16, 2026
Merged

fix(attachment): hash full file content instead of range content and consolidate FileInfo#2578
tusharmath merged 2 commits intomainfrom
attachement-hash-bug

Conversation

@tusharmath
Copy link
Copy Markdown
Collaborator

@tusharmath tusharmath commented Mar 16, 2026

Summary

Fix attachment content hashing to always use full file content instead of the requested range, and consolidate file metadata fields into a FileInfo struct for cleaner APIs.

Context

The external-change detector compares stored content hashes against hashes of subsequent file reads. Previously, the hash was computed from only the range of lines requested (e.g., lines 2-4 of a 10-line file), meaning a full-file read and a ranged read of the same unmodified file would produce different hashes. This caused the change detector to incorrectly flag files as externally modified when they hadn't been touched.

Additionally, ReadOutput and AttachmentContent had four separate flat fields (start_line, end_line, total_lines, content_hash) that were always set and used together, creating unnecessary boilerplate at every construction and access site.

Changes

  • Bug fix: range_read_utf8 in forge_fs now hashes the full file content (all lines) rather than only the requested range, ensuring consistent hashes across full and partial reads of the same file
  • Refactor: Introduced FileInfo::new(start_line, end_line, total_lines, content_hash) constructor and consolidated the four flat metadata fields into a single info: FileInfo field on ReadOutput and AttachmentContent
  • Updated all construction and access sites across forge_app, forge_services, and forge_domain to use the new FileInfo struct
  • Added sha2 dependency to forge_fs for computing full-file content hashes in the read layer
  • Updated documentation on FileReaderInfra::range_read_utf8 to clarify the hashing contract

Key Implementation Details

The hash is now computed over the complete file content (joined with \n) before slicing to the requested range. This means callers that store a hash from a ranged read can correctly compare it with a hash from a subsequent full-file read to detect external modifications — both will match as long as the file hasn't changed.

Use Cases

  • Prevents false "file externally modified" warnings when the agent previously read only a range of a file and later re-reads it fully
  • Any tool or service comparing content_hash values from different read operations on the same file now gets consistent results

Testing

cargo insta test --accept -p forge_fs
cargo insta test --accept -p forge_services
cargo insta test --accept -p forge_app

The test test_full_vs_ranged_file_hashes_are_equal (and related attachment tests) explicitly verifies that a full-file read and a ranged read of the same file produce identical content_hash values.

Links

  • Fixes the hash inconsistency reported in the attachement-hash-bug branch

@github-actions github-actions Bot added the type: fix Iterations on existing features or infrastructure. label Mar 16, 2026
@tusharmath tusharmath force-pushed the attachement-hash-bug branch from d09b545 to c9ac261 Compare March 16, 2026 12:24
@tusharmath tusharmath changed the title fix(attachment): hash full file content instead of range content fix(attachment): hash full file content instead of range content and consolidate FileInfo Mar 16, 2026
@tusharmath tusharmath merged commit 29db91a into main Mar 16, 2026
10 checks passed
@tusharmath tusharmath deleted the attachement-hash-bug branch March 16, 2026 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: fix Iterations on existing features or infrastructure.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant