fix(e2e): multi-tenant widget isolation + portfolio nudge recovery by serrrfirat · Pull Request #2790 · nearai/ironclaw

serrrfirat · 2026-04-21T15:18:18Z

Summary

Widget customization (3 tests): Tests expected multi-tenant behavior (CSS/widget/CSP isolation) but ran against the single-tenant default ironclaw_server. Added a session-scoped multi_tenant_gateway_server fixture with AGENT_MULTI_TENANT=true and its own libSQL database, and rewired the three failing tests to use it.
Portfolio chat (2 tests): The mock LLM's nudge response swallowed portfolio context — when the engine sent a tool-intent nudge ("You said you would perform an action..."), match_response() returned the generic "I found the information you requested." instead of a portfolio-relevant reply. Added context-aware nudge recovery that checks prior user messages for portfolio/wallet keywords. Also fixed word boundaries on the hello|hi|hey canned pattern to prevent "hi" from matching inside "this".

Companion to #2744 which fixes the other 8 real E2E failures on staging.

Test plan

pytest scenarios/test_widget_customization.py scenarios/test_portfolio.py -v — 9 passed, 5 skipped, 0 failed
Full suite verified: these were the remaining 5 real failures after fix(tests): close staging test backlog — full suite green #2744's coverage (6 routine_event_batch failures were flaky and pass on re-run)

🤖 Generated with Claude Code

…olio nudge recovery Widget customization: three tests expected multi-tenant behavior (CSS/widget/CSP isolation) but ran against the single-tenant default server. Add a session-scoped `multi_tenant_gateway_server` fixture with AGENT_MULTI_TENANT=true and its own libSQL database, and rewire the three failing tests to use it. Portfolio: the mock LLM's nudge response ("I found the information you requested.") swallowed portfolio context when the engine sent a tool-intent nudge. Add context-aware nudge recovery in match_response() that checks prior user messages for portfolio/wallet keywords before falling through to the generic nudge pattern. Also add word boundaries to the hello|hi|hey canned pattern to prevent "hi" from matching inside "this". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request enhances the E2E testing infrastructure by refining the mock LLM's response matching and introducing a dedicated multi-tenant gateway server for widget customization tests. Specifically, it adds word boundaries to greeting patterns and implements a 'nudge recovery' mechanism to preserve portfolio context when the LLM fails to call a tool. The test suite is updated to use a specialized multi-tenant fixture, ensuring better isolation for isolation-specific test cases. Feedback focuses on optimizing the nudge recovery logic by avoiding redundant regex compilation and improving context lookup efficiency by iterating through messages in reverse.

gemini-code-assist · 2026-04-21T15:20:15Z

+    _nudge_re = re.compile(
+        r"You said you would perform an action|You expressed intent",
+        re.IGNORECASE,
+    )
+    if _nudge_re.search(content):
+        for msg in messages:
+            if msg.get("role") == "user":
+                msg_text = _message_text(msg)
+                if re.search(r"portfolio|defi|rebalance|yield.*positions", msg_text, re.IGNORECASE):
+                    return (
+                        "I'll analyze your DeFi portfolio. The portfolio skill is active and I can scan "
+                        "your wallet addresses across chains to discover positions, check yields, and "
+                        "suggest rebalancing opportunities."
+                    )
+                if re.search(r"0x[a-fA-F0-9]{40}", msg_text, re.IGNORECASE):
+                    return (
+                        "I found your wallet address. Let me scan your portfolio across all supported "
+                        "chains to discover DeFi positions and classify them against known protocols."
+                    )


The regex _nudge_re is compiled inside the match_response function, which is called frequently during E2E tests. This is inefficient as it recompiles the regex on every call. Additionally, the loop iterates through messages from the beginning, whereas it's generally more efficient and robust to search for context starting from the most recent message. To improve performance, avoid redundant computations inside frequently called functions or loops. Consider using re.search with the string pattern directly (which Python caches internally) and iterating through reversed(messages).

Suggested change

_nudge_re = re.compile(

r"You said you would perform an action|You expressed intent",

re.IGNORECASE,

)

if _nudge_re.search(content):

for msg in messages:

if msg.get("role") == "user":

msg_text = _message_text(msg)

if re.search(r"portfolio|defi|rebalance|yield.*positions", msg_text, re.IGNORECASE):

return (

"I'll analyze your DeFi portfolio. The portfolio skill is active and I can scan "

"your wallet addresses across chains to discover positions, check yields, and "

"suggest rebalancing opportunities."

)

if re.search(r"0x[a-fA-F0-9]{40}", msg_text, re.IGNORECASE):

return (

"I found your wallet address. Let me scan your portfolio across all supported "

"chains to discover DeFi positions and classify them against known protocols."

)

if re.search(r"You said you would perform an action|You expressed intent", content, re.IGNORECASE):

for msg in reversed(messages):

if msg.get("role") == "user":

msg_text = _message_text(msg)

if re.search(r"portfolio|defi|rebalance|yield.*positions", msg_text, re.IGNORECASE):

return (

"I'll analyze your DeFi portfolio. The portfolio skill is active and I can scan "

"your wallet addresses across chains to discover positions, check yields, and "

"suggest rebalancing opportunities."

)

if re.search(r"0x[a-fA-F0-9]{40}", msg_text, re.IGNORECASE):

return (

"I found your wallet address. Let me scan your portfolio across all supported "

"chains to discover DeFi positions and classify them against known protocols."

)

References

To improve performance, avoid redundant computations inside loops or frequently called functions. For example, pre-calculate values or rely on internal caching instead of repeated expensive operations.

Forward cargo-llvm-cov env vars in multi_tenant_gateway_server fixture so code coverage from the 3 rewired widget tests is captured in CI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions Bot added size: XS < 10 changed lines (excluding docs) risk: low Changes to docs, tests, or low-risk modules contributor: core 20+ merged PRs labels Apr 21, 2026

gemini-code-assist Bot reviewed Apr 21, 2026

View reviewed changes

serrrfirat requested review from ilblackdragon April 21, 2026 15:32

fix: address review findings (iteration 1)

fd82d57

Forward cargo-llvm-cov env vars in multi_tenant_gateway_server fixture so code coverage from the 3 rewired widget tests is captured in CI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

serrrfirat merged commit e29429d into staging Apr 21, 2026
17 checks passed

serrrfirat deleted the fix/e2e-widget-portfolio-tests branch April 21, 2026 19:30

github-actions Bot mentioned this pull request Apr 21, 2026

chore: promote staging to staging-promote/4dea5dd5-24736995931 (2026-04-21 19:40 UTC) #2811

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(e2e): multi-tenant widget isolation + portfolio nudge recovery#2790

fix(e2e): multi-tenant widget isolation + portfolio nudge recovery#2790
serrrfirat merged 2 commits intostagingfrom
fix/e2e-widget-portfolio-tests

serrrfirat commented Apr 21, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

serrrfirat commented Apr 21, 2026

Summary

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant