Skip to content

feat: expand vision fallback to swipeOn, pinchOn, and dragAndDrop#1380

Merged
kaeawc merged 1 commit intomainfrom
work/1361-feat--expand-vision-fallback-to-support-
Feb 24, 2026
Merged

feat: expand vision fallback to swipeOn, pinchOn, and dragAndDrop#1380
kaeawc merged 1 commit intomainfrom
work/1361-feat--expand-vision-fallback-to-support-

Conversation

@kaeawc
Copy link
Owner

@kaeawc kaeawc commented Feb 24, 2026

Closes #1361

Summary

  • Extract a shared getVisionEnrichedError utility (src/vision/applyVisionFallback.ts) to consolidate the vision fallback logic that was previously inlined only in TapOnElement
  • Add a VisionAnalyzer interface to VisionTypes.ts to allow dependency injection and keep tests fast and non-flaky
  • Wire vision fallback into the error paths of SwipeOn, PinchOn, and DragAndDrop, in addition to the existing TapOnElement support
  • Refactor TapOnElement to use the shared utility instead of the previous inline implementation
  • Add a FakeVisionAnalyzer test fake and vision fallback tests for all four action classes

Test plan

  • bun run build passes with no errors
  • bun test test/vision/ test/features/action/ passes (333 pass, 0 fail)
  • New test files cover vision-enabled and vision-disabled paths for SwipeOn, PinchOn, DragAndDrop, and TapOnElement
  • applyVisionFallback.test.ts covers the shared utility directly

🤖 Generated with Claude Code

)

Extract shared `getVisionEnrichedError` utility and add `VisionAnalyzer`
interface for testability, then wire vision fallback into the error paths
of SwipeOn, PinchOn, and DragAndDrop. Refactor TapOnElement to use the
shared utility as well. Add vision fallback tests for all four actions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kaeawc kaeawc enabled auto-merge (squash) February 24, 2026 04:33
@github-actions
Copy link

MCP Benchmarks

Overall Status: ✅ PASSED
Status by Benchmark: Context Thresholds: ✅ | Tool Call Throughput: ✅ | Startup: ✅ | NPM Unpacked Size: ✅

Context Thresholds

Category Actual Threshold Usage Status
Tools 10,506 14,000 75%
Resources 440 1,000 44%
Resource Templates 1,563 2,000 78%
Total 12,509 17,000 74%

Overall Status: ✅ PASSED

Generated at 2026-02-24T04:36:13.885Z

Tool Call Throughput

Sample Size: 50 iterations per tool
Total Duration: 9.20s
Average Throughput: 38.03 ops/second

Fast Operations (<100ms)

Tool P50 P95 Mean Success Status
listDevices 0.0ms 0.0ms 0.0ms 100%
pressButton 0.1ms 0.2ms 0.1ms 100%

Medium Operations (100ms-1s)

Tool P50 P95 Mean Success Status
observe 22.5ms 35.2ms 24.4ms 100%
tapOn 0.1ms 0.1ms 0.1ms 100%
inputText 57.5ms 68.8ms 59.9ms 100%

Slow Operations (1s+)

Tool P50 P95 Mean Success Status
launchApp 16.0ms 22.1ms 16.7ms 100%
installApp 0.0ms 0.1ms 0.0ms 100%

Summary: 7/7 tools passed
Overall Status: ✅ PASSED

Generated at 2026-02-24T04:35:12.649Z

Startup Performance

MCP Server (stdio)

Mode Ready First Tool Call Heap Used
cold 519.0ms n/a 35.2MB
warm 531.0ms n/a 35.2MB

Device discovery: skipped (not run)

Daemon

Mode Spawn Ready Responsive Heap Used
cold 3.0ms 546.0ms 717.0ms 38.0MB
warm 2.0ms 535.0ms 717.0ms 38.1MB

Overall Status: ✅ PASSED

Generated at 2026-02-24T04:37:14Z

NPM Unpacked Size

Metric Actual Threshold Usage Status
Unpacked Size 12.5MB 30.0MB 42%

Package: @kaeawc/auto-mobile@0.0.13
Tarball Size: 2.6MB

Overall Status: ✅ PASSED

Generated at 2026-02-24T04:35:22.907Z

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cd6290d176

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

const baseErrorMessage = error instanceof Error ? error.message : String(error);
let errorMessage = `Failed to perform swipeOn: ${baseErrorMessage}`;

if (this.visionConfig.enabled && (normalizedOptions.lookFor || normalizedOptions.container)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict vision fallback to element-not-found failures

This catch block now invokes vision fallback for any exception whenever lookFor or container is set, but several thrown paths are infrastructure/runtime failures rather than selector misses (for example ScrollUntilVisible.execute throws Failed to get initial observation for scrolling until visible. in src/features/action/swipeon/ScrollUntilVisible.ts). In those cases we now take an unnecessary screenshot + AI call (cost/latency) and may replace the real root-cause message with generic "Element not found" guidance, which makes production debugging harder.

Useful? React with 👍 / 👎.

Comment on lines +184 to +185
const isSourceError = baseErrorMessage.toLowerCase().includes("source");
const failedTarget = isSourceError ? options.source : options.target;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use explicit error tags when choosing source vs target

Inferring the failed side with baseErrorMessage.toLowerCase().includes("source") is ambiguous and can misclassify target failures whenever the target selector text/ID itself contains the word "source" (e.g. dragAndDrop target not found with text 'Source account'). That sends vision analysis to the wrong element and returns misleading remediation steps.

Useful? React with 👍 / 👎.

@kaeawc kaeawc merged commit 8a35b73 into main Feb 24, 2026
52 checks passed
@kaeawc kaeawc deleted the work/1361-feat--expand-vision-fallback-to-support- branch February 24, 2026 04:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: expand vision fallback to support element search in swipeOn, pinchOn, dragAndDrop

1 participant