Skip to content

Conversation

@sammcj
Copy link
Owner

@sammcj sammcj commented Nov 26, 2025

  • feat: optionally provide -a to use Anthropic's token counter API (free, but requires key)
  • feat: apply minor calculation correction to offline token counter based on comparing with anthropic API (can be disabled with a flag)
  • chore: bump deps and go version

This pull request introduces several improvements and updates across the codebase, focusing on enhanced symlink handling, dependency upgrades, improved token counting documentation, and workflow/linting enhancements. The most significant changes are grouped and summarised below.

Filesystem and Symlink Handling:

  • Added robust handling to skip symlinked files and directories unless an explicit followSymlinks flag is set, improving safety and predictability when traversing directories and processing files (filesystem/filesystem.go). [1] [2] [3] [4] [5]
  • Updated the default exclude patterns to filter out more files and directories, including .claude/*.json, .mcp.json, CONTRIBUTORS.md, etc. (filesystem/defaultExcludes.go).

Dependency and Tooling Upgrades:

  • Upgraded Go version to 1.25.4 and updated multiple dependencies to their latest versions in go.mod and workflow files for improved stability, performance, and compatibility. [1] [2] [3]
  • Updated linter configuration (.golangci.yml) and integrated golangci-lint and code modernization checks into the workflow and Makefile for better code quality enforcement. [1] [2] [3]

Documentation and Token Counting:

  • Expanded the README with detailed documentation on token counting, including the new offline tokeniser correction factor, Anthropic API integration, and relevant flags. [1] [2] [3]
  • Updated flag documentation to reflect new token counting options and clarified usage.

Build and Compression Improvements:

  • Added -w -s linker flags in build scripts and Makefile to reduce binary size. [1] [2]
  • Refactored the generic compressor to simplify error handling and function signatures for HTML and other language-specific compression routines (internal/compressor/genericCompressor.go). [1] [2] [3] [4] [5] [6]

Miscellaneous:

  • Minor README cleanups, such as removing the outdated acknowledgements section.

These changes collectively improve code safety, maintainability, documentation clarity, and build efficiency.

@socket-security
Copy link

socket-security bot commented Nov 26, 2025

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces token counting via Anthropic's API, updates dependencies including Go version, enhances symlink handling, and improves build configuration. The changes span token counting infrastructure, filesystem safety features, code compression refactoring, and CI/CD improvements.

Key changes:

  • Adds optional Anthropic API integration for token counting with offline fallback and correction multiplier
  • Implements robust symlink detection and skipping (unless explicitly enabled via flag)
  • Updates Go version and multiple dependencies to latest versions

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
token/token.go Adds Anthropic API support, correction multiplier (1.18x), and batch token counting with fallback logic
token/anthropic.go New file implementing Anthropic API client for token counting with parallel batch processing
main.go Adds -a/--anthropic and --no-correction flags; updates all token counting calls; adds followSymlinks flag
template/template.go Updates type from interface{} to any (modern Go idiom)
internal/compressor/genericCompressor.go Simplifies language-specific compression functions to return strings directly instead of (string, error)
filesystem/filesystem.go Adds symlink detection using os.Lstat and skips symlinks by default; adds followSymlinks parameter
filesystem/defaultExcludes.go Adds new default exclude patterns (.claude/*.json, .mcp.json, CONTRIBUTORS.md)
go.mod Updates Go version to 1.25.4 and bumps multiple dependencies
go.sum Reflects updated dependency versions
Makefile Adds -w -s linker flags; integrates golangci-lint and modernize tool into lint target
.golangci.yml New linter configuration file with unparam linter and exclusion rules
.github/workflows/build-release-publish.yml Updates Go version; adds linting step to CI workflow
README.md Adds comprehensive token counting documentation; updates flag descriptions
Comments suppressed due to low confidence (1)

internal/compressor/genericCompressor.go:257

  • The removal of the break statement means that if multiple matching children are found, directChildDecl will be set to the last matching child instead of the first. If the intent is to find the first match (which is typical), the break should be restored:
case "function_declaration", "class_declaration", "generator_function_declaration", "arrow_function", "function_expression":
    directChildDecl = child
    break  // Stop at first match
default:
    // Not a type we are looking for
}

If the intent is to find the last match, this should be documented.

							var directChildDecl *sitter.Node
							for i := 0; i < int(jsNode.NamedChildCount()); i++ {
								child := jsNode.NamedChild(i)
								// Broaden condition to find anonymous function/class declarations as well
								switch child.Type() {
								case "function_declaration", "class_declaration", "generator_function_declaration", "arrow_function", "function_expression":
									// These are types that can be default exported and might have bodies to strip.
									directChildDecl = child
								default:
									// Not a type we are looking for as a direct declaration in `export default ...`
								}
							}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 14 changed files in this pull request and generated 21 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

@sammcj sammcj requested a review from Copilot November 26, 2025 22:26
@sammcj sammcj merged commit 6960355 into main Nov 26, 2025
10 checks passed
@sammcj sammcj deleted the update branch November 26, 2025 22:28
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 14 changed files in this pull request and generated 4 comments.


You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

Comment on lines +110 to +132
// Process in batches
for i := 0; i < len(contents); i += batchSize {
end := min(i+batchSize, len(contents))

// Process this batch concurrently
for j := i; j < end; j++ {
wg.Add(1)
go func(index int, content string) {
defer wg.Done()
count, err := CountTokensAPI(content)
if err != nil {
errors[index] = err
results[index] = 0
} else {
results[index] = count
}
}(j, contents[j])
}

// Wait for this batch to complete before starting the next
wg.Wait()
}

Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The batch processing logic waits for each batch to complete before starting the next batch (line 130). If the goal is to limit concurrent requests, consider using a worker pool pattern or semaphore instead. The current implementation processes batches sequentially, which means if you have 20 items with batchSize=4, it will take 5 sequential rounds. A better approach would be to maintain a constant level of concurrency (4 goroutines) throughout the entire operation.

Suggested change
// Process in batches
for i := 0; i < len(contents); i += batchSize {
end := min(i+batchSize, len(contents))
// Process this batch concurrently
for j := i; j < end; j++ {
wg.Add(1)
go func(index int, content string) {
defer wg.Done()
count, err := CountTokensAPI(content)
if err != nil {
errors[index] = err
results[index] = 0
} else {
results[index] = count
}
}(j, contents[j])
}
// Wait for this batch to complete before starting the next
wg.Wait()
}
// Use a buffered channel as a semaphore to limit concurrency
semaphore := make(chan struct{}, batchSize)
for i, content := range contents {
wg.Add(1)
semaphore <- struct{}{} // acquire a slot
go func(index int, content string) {
defer wg.Done()
defer func() { <-semaphore }() // release the slot
count, err := CountTokensAPI(content)
if err != nil {
errors[index] = err
results[index] = 0
} else {
results[index] = count
}
}(i, content)
}
wg.Wait()

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,30 @@
version: "2"
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The golangci-lint configuration version "2" doesn't align with the golangci-lint version format. The version field in .golangci.yml typically isn't used to specify the golangci-lint tool version. If this is intended as a configuration schema version, it should be documented. Otherwise, consider removing this line as it may cause confusion or compatibility issues.

Suggested change
version: "2"

Copilot uses AI. Check for mistakes.

// httpClient is a reusable HTTP client with connection pooling and timeout
var httpClient = &http.Client{
Timeout: 30 * time.Second,
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] A 30-second timeout may be excessive for token counting API calls, which should typically be fast. Consider reducing this to 10-15 seconds to fail faster on network issues, or make it configurable. This becomes especially important when processing multiple files in batch mode, where each timeout adds significant delay.

Suggested change
Timeout: 30 * time.Second,
Timeout: 10 * time.Second,

Copilot uses AI. Check for mistakes.
Comment on lines +50 to +96
func CountTokensAPI(content string) (int, error) {
apiKey, err := getAnthropicAPIKey()
if err != nil {
return 0, err
}

requestBody := AnthropicTokenCountRequest{
Model: AnthropicModel,
Messages: []Message{
{
Role: "user",
Content: content,
},
},
}

jsonData, err := json.Marshal(requestBody)
if err != nil {
return 0, fmt.Errorf("failed to marshal request: %w", err)
}

req, err := http.NewRequest("POST", "https://api.anthropic.com/v1/messages/count_tokens", bytes.NewBuffer(jsonData))
if err != nil {
return 0, fmt.Errorf("failed to create request: %w", err)
}

req.Header.Set("Content-Type", "application/json")
req.Header.Set("x-api-key", apiKey)
req.Header.Set("anthropic-version", "2023-06-01")

resp, err := httpClient.Do(req)
if err != nil {
return 0, fmt.Errorf("failed to make request: %w", err)
}
defer resp.Body.Close()

if resp.StatusCode != http.StatusOK {
body, _ := io.ReadAll(resp.Body)
return 0, fmt.Errorf("API request failed with status %d: %s", resp.StatusCode, string(body))
}

var response AnthropicTokenCountResponse
if err := json.NewDecoder(resp.Body).Decode(&response); err != nil {
return 0, fmt.Errorf("failed to decode response: %w", err)
}

return response.InputTokens, nil
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CountTokensAPI function should accept a context.Context parameter to enable request cancellation and timeout control. This is especially important for API calls that could hang or need to be cancelled. Consider:

func CountTokensAPI(ctx context.Context, content string) (int, error) {
    // ...
    req, err := http.NewRequestWithContext(ctx, "POST", "https://api.anthropic.com/v1/messages/count_tokens", bytes.NewBuffer(jsonData))
    // ...
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant