fix(e2e): multi-tenant widget isolation + portfolio nudge recovery#2790
fix(e2e): multi-tenant widget isolation + portfolio nudge recovery#2790serrrfirat merged 2 commits intostagingfrom
Conversation
…olio nudge recovery
Widget customization: three tests expected multi-tenant behavior (CSS/widget/CSP
isolation) but ran against the single-tenant default server. Add a session-scoped
`multi_tenant_gateway_server` fixture with AGENT_MULTI_TENANT=true and its own
libSQL database, and rewire the three failing tests to use it.
Portfolio: the mock LLM's nudge response ("I found the information you
requested.") swallowed portfolio context when the engine sent a tool-intent
nudge. Add context-aware nudge recovery in match_response() that checks prior
user messages for portfolio/wallet keywords before falling through to the
generic nudge pattern. Also add word boundaries to the hello|hi|hey canned
pattern to prevent "hi" from matching inside "this".
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request enhances the E2E testing infrastructure by refining the mock LLM's response matching and introducing a dedicated multi-tenant gateway server for widget customization tests. Specifically, it adds word boundaries to greeting patterns and implements a 'nudge recovery' mechanism to preserve portfolio context when the LLM fails to call a tool. The test suite is updated to use a specialized multi-tenant fixture, ensuring better isolation for isolation-specific test cases. Feedback focuses on optimizing the nudge recovery logic by avoiding redundant regex compilation and improving context lookup efficiency by iterating through messages in reverse.
| _nudge_re = re.compile( | ||
| r"You said you would perform an action|You expressed intent", | ||
| re.IGNORECASE, | ||
| ) | ||
| if _nudge_re.search(content): | ||
| for msg in messages: | ||
| if msg.get("role") == "user": | ||
| msg_text = _message_text(msg) | ||
| if re.search(r"portfolio|defi|rebalance|yield.*positions", msg_text, re.IGNORECASE): | ||
| return ( | ||
| "I'll analyze your DeFi portfolio. The portfolio skill is active and I can scan " | ||
| "your wallet addresses across chains to discover positions, check yields, and " | ||
| "suggest rebalancing opportunities." | ||
| ) | ||
| if re.search(r"0x[a-fA-F0-9]{40}", msg_text, re.IGNORECASE): | ||
| return ( | ||
| "I found your wallet address. Let me scan your portfolio across all supported " | ||
| "chains to discover DeFi positions and classify them against known protocols." | ||
| ) |
There was a problem hiding this comment.
The regex _nudge_re is compiled inside the match_response function, which is called frequently during E2E tests. This is inefficient as it recompiles the regex on every call. Additionally, the loop iterates through messages from the beginning, whereas it's generally more efficient and robust to search for context starting from the most recent message. To improve performance, avoid redundant computations inside frequently called functions or loops. Consider using re.search with the string pattern directly (which Python caches internally) and iterating through reversed(messages).
| _nudge_re = re.compile( | |
| r"You said you would perform an action|You expressed intent", | |
| re.IGNORECASE, | |
| ) | |
| if _nudge_re.search(content): | |
| for msg in messages: | |
| if msg.get("role") == "user": | |
| msg_text = _message_text(msg) | |
| if re.search(r"portfolio|defi|rebalance|yield.*positions", msg_text, re.IGNORECASE): | |
| return ( | |
| "I'll analyze your DeFi portfolio. The portfolio skill is active and I can scan " | |
| "your wallet addresses across chains to discover positions, check yields, and " | |
| "suggest rebalancing opportunities." | |
| ) | |
| if re.search(r"0x[a-fA-F0-9]{40}", msg_text, re.IGNORECASE): | |
| return ( | |
| "I found your wallet address. Let me scan your portfolio across all supported " | |
| "chains to discover DeFi positions and classify them against known protocols." | |
| ) | |
| if re.search(r"You said you would perform an action|You expressed intent", content, re.IGNORECASE): | |
| for msg in reversed(messages): | |
| if msg.get("role") == "user": | |
| msg_text = _message_text(msg) | |
| if re.search(r"portfolio|defi|rebalance|yield.*positions", msg_text, re.IGNORECASE): | |
| return ( | |
| "I'll analyze your DeFi portfolio. The portfolio skill is active and I can scan " | |
| "your wallet addresses across chains to discover positions, check yields, and " | |
| "suggest rebalancing opportunities." | |
| ) | |
| if re.search(r"0x[a-fA-F0-9]{40}", msg_text, re.IGNORECASE): | |
| return ( | |
| "I found your wallet address. Let me scan your portfolio across all supported " | |
| "chains to discover DeFi positions and classify them against known protocols." | |
| ) |
References
- To improve performance, avoid redundant computations inside loops or frequently called functions. For example, pre-calculate values or rely on internal caching instead of repeated expensive operations.
Forward cargo-llvm-cov env vars in multi_tenant_gateway_server fixture so code coverage from the 3 rewired widget tests is captured in CI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
ironclaw_server. Added a session-scopedmulti_tenant_gateway_serverfixture withAGENT_MULTI_TENANT=trueand its own libSQL database, and rewired the three failing tests to use it.match_response()returned the generic "I found the information you requested." instead of a portfolio-relevant reply. Added context-aware nudge recovery that checks prior user messages for portfolio/wallet keywords. Also fixed word boundaries on thehello|hi|heycanned pattern to prevent "hi" from matching inside "this".Companion to #2744 which fixes the other 8 real E2E failures on staging.
Test plan
pytest scenarios/test_widget_customization.py scenarios/test_portfolio.py -v— 9 passed, 5 skipped, 0 failed🤖 Generated with Claude Code