Skip to content

Chat: expand mixed-domain ambiguity strict coverage#796

Merged
PrzemyslawKlys merged 3 commits intomasterfrom
feat/chat-mixed-domain-ambiguity-scenario-expansion-20260225
Feb 25, 2026
Merged

Chat: expand mixed-domain ambiguity strict coverage#796
PrzemyslawKlys merged 3 commits intomasterfrom
feat/chat-mixed-domain-ambiguity-scenario-expansion-20260225

Conversation

@PrzemyslawKlys
Copy link
Member

Summary

  • add a second strict/live mixed-domain ambiguity scenario (dns-first-switchback) to cover DNS-first routing and AD switch-back behavior
  • tighten catalog strictness tests to validate all mixed-domain-ambiguity-*-10-turn.json scenarios, not just a single file
  • extend the scenario catalog quality gate script with mixed-domain route-split checks and a minimum scenario count requirement

Validation

  • pwsh .github/scripts/test-chat-scenario-catalog-quality.ps1
  • dotnet test IntelligenceX.Chat/IntelligenceX.Chat.Tests/IntelligenceX.Chat.Tests.csproj -c Release --filter "FullyQualifiedName~HostScenarioCatalogStrictnessTests"
  • dotnet build IntelligenceX.sln -c Release
  • dotnet test IntelligenceX.sln -c Release
  • dotnet ./IntelligenceX.Tests/bin/Release/net8.0/IntelligenceX.Tests.dll
  • dotnet ./IntelligenceX.Tests/bin/Release/net10.0/IntelligenceX.Tests.dll

@intelligencex-review
Copy link

intelligencex-review bot commented Feb 25, 2026

IntelligenceX Review

Reviewing PR #796: Chat: expand mixed-domain ambiguity strict coverage
Reviewed commit: 16a05ee

Merge blockers: items in Todo List ✅ and Critical Issues ⚠️ sections (if present). Other Issues 🧯 are suggestions.

Auto-resolve (AI): 0 resolved, 3 kept (candidates 3, assessments 3, attempts 0, evidence_rejected 0, resolve_failed 0, sweep_resolved 0, missing_ids 0, duplicate_ids 0).

Review context truncated: 2 patches trimmed to 4000 chars. Impact: review covers only included diff context; issues outside it may be missed. Increase review.maxFiles/review.maxPatchChars for wider coverage.

Summary 📝

PR direction is solid (adds meaningful strict coverage and a new mixed-domain scenario), but there is one merge-blocking correctness issue in the PowerShell quality gate: the mixed-domain minimum-count rule appears to be applied globally and can break filtered runs.

Todo List ✅

  • Scope the mixed-domain minimum scenario count enforcement to default/full catalog runs (or to filters that explicitly include mixed-domain-ambiguity-*) instead of all filtered executions.
    Rationale: unconditional enforcement makes -Filter unusable for targeted validations and causes false failures.

Critical Issues ⚠️

  • .github/scripts/test-chat-scenario-catalog-quality.ps1: The new count gate for mixed-domain-ambiguity-* scenarios is computed even when a custom -Filter is supplied, which can fail valid subset runs (e.g., -Filter "ad-*-10-turn.json").
    Rationale: this is a correctness regression in script behavior and breaks existing focused workflows.

Other Issues 🧯

  • mixed-domain-ambiguity-dns-first-switchback-10-turn.json: scenario constraints look coherent and maintainable; no correctness concerns from the shown diff.
  • ad-reboot-local-10-turn.json: adding min_distinct_tool_input_values.machine_name: 2 improves determinism and reduces false-positive passes; good hardening change.
  • HostScenarioCatalogStrictnessTests.cs: diff not shown here, so no additional findings can be validated from provided evidence.

Other Reviews 🧩

  • actionable: Prior stale thread about scoping mixed-scenario count to selected files is valid and matches the blocking issue above.

Tests / Coverage 🧪

  • Validation list is strong and includes both targeted and full-suite paths.
  • Please add/confirm a script test case covering filtered execution behavior (-Filter "ad-*-10-turn.json") to prevent regression of -Filter semantics.

Next Steps 🚀

  1. Gate the mixed-domain minimum-count assertion behind default/full-run conditions.
  2. Add/adjust a test for filtered script runs to lock expected behavior.
  3. Re-run the listed validation commands and merge once green.

Static Analysis Policy 🧭

  • Config mode: respect
  • Packs: All Essentials (50)
  • Rules: 108 enabled
  • Rule list display: up to 10 items per section
  • Enabled rules preview: CA2000 (Dispose objects before losing scope), CA1062 (Validate arguments of public methods), SA1600 (Elements should be documented), CA1016 (Mark assemblies with assembly version), CA1018 (Mark attributes with AttributeUsageAttribute), CA1041 (Provide ObsoleteAttribute message), CA1047 (Do not declare protected member in sealed type), CA1050 (Declare types in namespaces), CA1061 (Do not hide base class methods), CA1067 (Override Object.Equals(object) when implementing IEquatable<T>) (truncated)
  • Gate rule IDs: IXTOOL001 (Write-capable tools should use canonical schema helpers)
  • Result files: 2 input patterns, 0 matched, 0 parsed, 0 failed
  • Status: unavailable
  • Rule outcomes: unavailable (no analysis result files matched configured inputs)

Static Analysis 🔎

  • Findings: unavailable (no analysis result files matched configured inputs)

IntelligenceX thread triage

Assessed commit: 16a05ee

Diff range: PR base → head (7d22282..16a05ee)

Needs attention:

  • PRRT_kwDORAe5k85wjleD: The provided current diff context for this thread still only shows introduction of the counter variable, without a visible minimum-threshold failure check tied to $failures.
  • PRRT_kwDORAe5k85wjmkL: Although new variables indicate intent to scope behavior to selected files, the visible patch is truncated and does not clearly show the mixed-domain minimum check being conditionally gated by filter/default-selection logic.
  • PRRT_kwDORAe5k85wjnCE: The shown line-level context is inside per-turn token extraction and does not include the minimum-count enforcement logic, so there is no clear evidence that unconditional enforcement was fixed.

Model & Usage 🤖

  • Model: gpt-5.3-codex
  • Length: medium
  • Mode: hybrid
  • Reasoning: not configured
  • Usage: 5h limit: 100% remaining | weekly limit: 65% remaining | code review weekly limit: 100% remaining | credits: 0

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 12b35b37ac

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@PrzemyslawKlys
Copy link
Member Author

Addressed the latest IntelligenceX/Codex blocker items in this PR:

  • scoped mixed-domain minimum-count gate to selected/default scenario set in .github/scripts/test-chat-scenario-catalog-quality.ps1
  • validated both full and filtered invocations locally
  • resolved the related review threads

Current merge blocker is now infra-only:

Once the external dependency/workflow blocker is cleared, this PR should be ready to merge.

@PrzemyslawKlys PrzemyslawKlys merged commit 1f01fe3 into master Feb 25, 2026
7 of 8 checks passed
@PrzemyslawKlys PrzemyslawKlys deleted the feat/chat-mixed-domain-ambiguity-scenario-expansion-20260225 branch February 25, 2026 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant