Skip to content

chore(api): simplify sandbox removing logic#2016

Draft
jakubno wants to merge 1 commit intomainfrom
chore/simplify-kill-logic
Draft

chore(api): simplify sandbox removing logic#2016
jakubno wants to merge 1 commit intomainfrom
chore/simplify-kill-logic

Conversation

@jakubno
Copy link
Member

@jakubno jakubno commented Feb 27, 2026

Note

Medium Risk
Changes the core sandbox lifecycle path (RemoveSandbox/StartRemoving) used by kill/pause/eviction/snapshot flows, so regressions could cause incorrect NotFound/transition handling or leaked state if edge cases are missed.

Overview
Simplifies sandbox removal by making RemoveSandbox operate on teamID/sandboxID (instead of requiring a pre-fetched sandbox) and having StartRemoving return the sandbox while enforcing team ownership and consistent not-found/transition behavior; handlers and eviction/snapshot flows now call the unified removal path directly, reducing extra lookups and consolidating error handling/metrics updates.

Written by Cursor Bugbot for commit 9de0030. This will update automatically on new commits. Configure here.

// Should timeout
require.Error(t, err2)
require.ErrorIs(t, err2, context.DeadlineExceeded)
assert.NotNil(t, callback)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert.NotNil(t, callback) is incorrect and will fail. When waitForTransition returns context.DeadlineExceeded, handleExistingTransition returns nil for the callback:

err := s.waitForTransition(ctx, teamID, sbx.SandboxID, transactionID)
if err != nil {
    return sbx, false, nil, fmt.Errorf("failed waiting for transition: %w", err)
}

The callback will always be nil on the error path, so this assertion should be assert.Nil(t, callback).

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

err = startTransitionScript.Run(ctx, s.redisClient, []string{key, transitionKey, resultKey}, newData, transitionID, ttlSeconds, resultTtlSeconds).Err()
if err != nil {
return false, nil, fmt.Errorf("failed to update sandbox state: %w", err)
return sbx, false, nil, fmt.Errorf("failed to update sandbox state: %w", err)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redis returns mutated state on error, masking kill failures

Medium Severity

In Redis StartRemoving, sbx.State is mutated to the target state on line 100 before json.Marshal and startTransitionScript.Run execute. If either fails, the returned sandbox.Sandbox has State already set to the new state (e.g., StateKilling), even though Redis was never updated. In RemoveSandbox, the kill error handler checks sbx.State and matches StateKilling, returning nil instead of an error — making callers believe the kill succeeded. The memory storage avoids this by snapshotting the sandbox via sbx.Data() before mutation.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants