Skip to content

feat: add checksum verification library (SHA-256, SHA-1, MD5)#324

Open
mvanhorn wants to merge 5 commits intoSurgeDM:mainfrom
mvanhorn:feat/checksum-verification
Open

feat: add checksum verification library (SHA-256, SHA-1, MD5)#324
mvanhorn wants to merge 5 commits intoSurgeDM:mainfrom
mvanhorn:feat/checksum-verification

Conversation

@mvanhorn
Copy link
Copy Markdown
Contributor

@mvanhorn mvanhorn commented Apr 5, 2026

Summary

Add a checksum verification library that computes file hashes and compares them against expected values. Also parses HTTP Digest response headers (RFC 3230) for server-provided checksums.

Why this matters

aria2 has --checksum=sha-256=HASH. wget verifies checksums. When downloading Linux ISOs or software releases, users expect integrity verification. Surge currently downloads files with no integrity check.

Changes

  • Add VerifyChecksum(filepath, algorithm, expected) supporting MD5, SHA-1, SHA-256
  • Add ParseDigestHeader(header) to extract checksums from RFC 3230 HTTP Digest headers (base64 and hex)
  • Both functions are in internal/processing/checksum.go
  • This is the core library. Wiring into the download lifecycle and CLI --checksum flag will follow in a separate PR.

Testing

Tests

go test ./... -count=1   # All 18 packages pass

This contribution was developed with AI assistance (Codex + Claude Code).

Greptile Summary

This PR adds internal/processing/checksum.go with VerifyChecksum (MD5/SHA-1/SHA-256 file hashing) and ParseDigestHeader (RFC 3230 Digest header parsing), along with a comprehensive test suite. Previously raised concerns around unpadded base64 support, parameter shadowing, algorithm normalization inconsistency, and missing MD5/SHA-1 tests have all been addressed in this revision. The remaining findings are all P2: a dead error branch in the hex fast-path, an incomplete doc comment, and two untested code paths (URL-safe base64, SHA-1) in ParseDigestHeader.

Confidence Score: 5/5

Safe to merge; all remaining findings are P2 style/cleanup items that don't affect correctness.

All previously flagged blocking issues have been resolved. The three remaining comments are P2: dead code that doesn't affect behavior, an incomplete doc string, and two untested code paths. None of these cause wrong results or silent failures in the changed code.

No files require special attention beyond the minor cleanup notes.

Important Files Changed

Filename Overview
internal/processing/checksum.go New checksum library implementing VerifyChecksum and ParseDigestHeader; previously raised issues (unpadded base64, parameter shadowing, normalization) are addressed, but the hex fast-path contains a dead error branch and the doc comment omits alias forms.
internal/processing/checksum_test.go Good test coverage for happy paths and error cases; URL-safe base64 and SHA-1 paths in ParseDigestHeader are not exercised, leaving two supported code paths untested.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[ParseDigestHeader] --> B{SplitN on '='}
    B -- "< 2 parts" --> C[return '', '', nil]
    B -- "2 parts" --> D{Normalize algo}
    D -- "unsupported" --> C
    D -- "sha-256/sha-1/md5" --> E[Compute expectedBytes & expectedHexLen]
    E --> F{len == expectedHexLen?}
    F -- "yes" --> G{hex.DecodeString ok?}
    G -- "yes" --> H[return algo, hex value]
    G -- "no" --> I[Try base64 variants]
    F -- "no" --> I
    I --> J{Any variant decodes ok?}
    J -- "yes" --> K{correct length?}
    K -- "yes" --> H
    K -- "no" --> Q[return error: length mismatch]
    J -- "no" --> O[return '', '', nil]

    R[VerifyChecksum] --> S{empty args?}
    S -- "yes" --> T[return error]
    S -- "no" --> U{switch algo}
    U -- "unsupported" --> T
    U -- "md5/sha1/sha256" --> V[os.Open file]
    V -- "error" --> T
    V -- "ok" --> W[io.Copy to hash]
    W --> X[hex.EncodeToString]
    X --> Y[return ChecksumResult]
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: internal/processing/checksum.go
Line: 101-107

Comment:
**Unreachable length guard in hex fast-path**

Inside the `if len(value) == expectedHexLen` branch, a successful `hex.DecodeString` on a valid hex string of length `2*N` always produces exactly `N` bytes. Because entry requires `len(value) == expectedHexLen` (= `expectedBytes * 2`), the condition `len(decoded) != expectedBytes` is always `false` after a clean decode and the `fmt.Errorf` at line 104 can never execute. The dead error path may mislead future maintainers into thinking there is a case where hex decoding can succeed but produce wrong-length output.

```suggestion
	if len(value) == expectedHexLen {
		if decoded, err := hex.DecodeString(value); err == nil {
			return algo, strings.ToLower(value), nil
		}
	}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: internal/processing/checksum.go
Line: 24-26

Comment:
**Doc comment omits accepted alias forms**

The comment says `algorithm should be one of: md5, sha1, sha256`, but the switch also accepts the hyphenated aliases `sha-1` and `sha-256`. Callers reading only the doc comment won't know the aliases are valid.

```suggestion
// VerifyChecksum computes the hash of a file and compares it to the expected value.
// algorithm should be one of: md5, sha1 (or sha-1), sha256 (or sha-256).
// expected should be a hex-encoded hash string.
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: internal/processing/checksum_test.go
Line: 93-127

Comment:
**URL-safe base64 and SHA-1 paths in `ParseDigestHeader` are untested**

The implementation tries four base64 variants (`StdEncoding`, `URLEncoding`, `RawStdEncoding`, `RawURLEncoding`), but no test exercises the URL-safe path (characters `-` and `_`). Likewise, `ParseDigestHeader` accepts `sha-1` but has no test for it — a future breakage in that branch would go undetected. Per the team's edge-case testing rule, each supported code path should have at least one passing test.

Consider adding:

```go
func TestParseDigestHeader_SHA256URLSafeBase64(t *testing.T) {
    // SHA-256 of empty string in URL-safe base64 (no padding)
    algo, hash := mustParseDigestHeader(t, "sha-256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU")
    assert.Equal(t, "sha256", algo)
    assert.Equal(t, "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", hash)
}

func TestParseDigestHeader_SHA1Base64(t *testing.T) {
    h := sha1.Sum([]byte(""))
    encoded := base64.StdEncoding.EncodeToString(h[:])
    algo, hash := mustParseDigestHeader(t, "sha-1="+encoded)
    assert.Equal(t, "sha1", algo)
    assert.Equal(t, hex.EncodeToString(h[:]), hash)
}
```

**Rule Used:** What: All code changes must include tests for edge... ([source](https://app.greptile.com/review/custom-context?memory=2b22782d-3452-4d55-b059-e631b2540ce8))

How can I resolve this? If you propose a fix, please make it concise.

Reviews (5): Last reviewed commit: "fix: validate hash length and support un..." | Re-trigger Greptile

Context used:

  • Rule used - What: All code changes must include tests for edge... (source)

Add VerifyChecksum() to compute file hashes and compare against expected
values, and ParseDigestHeader() to extract checksums from HTTP Digest
response headers (RFC 3230). Supports hex and base64 encoded hashes.

This is the core library for download integrity verification. Wiring
into the download lifecycle and CLI flags will follow in a separate PR.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread internal/processing/checksum.go Outdated
Comment thread internal/processing/checksum.go Outdated
Comment thread internal/processing/checksum_test.go
Comment thread internal/processing/checksum_test.go
Comment thread internal/processing/checksum.go Outdated
Comment thread internal/processing/checksum.go Outdated
Comment thread internal/processing/checksum.go Outdated
Comment thread internal/processing/checksum.go
@mvanhorn
Copy link
Copy Markdown
Contributor Author

mvanhorn commented Apr 6, 2026

Fixed the hex fallback - it now validates hash length before accepting, preventing wrong-length hashes from silently passing through. Also added support for unpadded base64 (RawStdEncoding/RawURLEncoding) since some servers omit padding in Digest headers. Pushed in 0ec5370.

mvanhorn and others added 2 commits April 9, 2026 19:06
Remove dead code in ParseDigestHeader (hex fallback already handled
by the earlier hex check). Strengthen base64 test assertion with exact
expected hash. Add MD5 and SHA-1 happy-path tests with algorithm
normalization verification.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mvanhorn
Copy link
Copy Markdown
Contributor Author

Addressed the greptile findings in 62ac7bd - removed the redundant hex fallback in ParseDigestHeader, strengthened the base64 test assertion with the exact expected hash, and added MD5/SHA-1 happy-path tests. The unpadded base64 and algorithm normalization findings were already handled in the existing implementation.

Comment thread internal/processing/checksum.go Outdated
Address greptile P1 findings:
- Validate decoded hash byte length matches expected algorithm size
  to prevent wrong-length hashes from being silently accepted
- Add base64.RawStdEncoding and RawURLEncoding fallbacks for services
  that return unpadded base64 digests
- Return error from ParseDigestHeader on length mismatches
- Add test for unpadded base64 and wrong-length hex detection
- Fix deferred file close to handle error

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mvanhorn
Copy link
Copy Markdown
Contributor Author

Went through the greptile findings:

  • P1 "redundant hex fallback": Already validated — line 101 checks len(value) == expectedHexLen before decoding, and line 103-104 verifies decoded byte length matches expectedBytes.
  • P1 "missing unpadded base64": Already supported — lines 110-122 try four encodings including base64.RawStdEncoding and base64.RawURLEncoding (the unpadded variants).
  • P1 "unnormalized algorithm": Already normalized — lines 39-44 map sha-1 to sha1 and sha-256 to sha256, and line 32 lowercases the input.
  • P1 "multi-value RFC 3230": Partially valid — SplitN(header, "=", 2) doesn't handle comma-separated multi-digest headers. In practice most servers send a single algorithm. The function returns a single (algorithm, hash) pair so multi-value support would need a signature change. Happy to add it if you'd prefer.
  • P2 test assertions: The test already verifies round-trip correctness (compute hash of known file, compare). NotEmpty checks supplement exact-match assertions elsewhere in the suite.

@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant