Skip to content

Eliminate double LLM call for non-tool messages (#672)#727

Merged
Chris0Jeky merged 5 commits intomainfrom
fix/672-eliminate-double-llm-call
Apr 3, 2026
Merged

Eliminate double LLM call for non-tool messages (#672)#727
Chris0Jeky merged 5 commits intomainfrom
fix/672-eliminate-double-llm-call

Conversation

@Chris0Jeky
Copy link
Copy Markdown
Owner

Summary

  • When the tool-calling orchestrator returns a text response without invoking any tools, reuse that content directly instead of discarding it and making a redundant CompleteAsync call
  • Run the local LlmIntentClassifier on the user message to preserve actionability detection and proposal creation for messages like "create card X"
  • Eliminates redundant second LLM call for conversational (non-tool) messages on live providers (OpenAI/Gemini), halving latency and cost

Closes #672

Test plan

  • Non-tool chat messages only trigger one LLM provider call (verified via Moq Times.Never on CompleteAsync)
  • Tool-calling messages still work correctly with multi-turn loop
  • Degraded orchestrator with null content correctly falls back to CompleteAsync
  • Non-board-scoped sessions still use single-turn CompleteAsync (orchestrator not invoked)
  • Reused orchestrator response still triggers proposal creation when RequestProposal is set
  • Response content quality is unchanged (intent classifier + instruction extractor run locally)
  • Mock provider behavior unchanged
  • All existing chat, orchestrator, and integration tests pass (1765 total, 0 failures)

When the tool-calling orchestrator returns a text response without
invoking any tools, reuse that content directly instead of discarding
it and making a redundant CompleteAsync call. This eliminates the
double LLM call for non-tool messages on live providers, halving
latency and cost for conversational replies in board-scoped sessions.

The proposal creation path is preserved: the reused response flows
through the same shouldAttemptProposal logic as before.
When the orchestrator's no-tool text response is reused, run the local
LlmIntentClassifier on the user message to determine actionability and
extract instructions. This preserves the proposal creation path for
messages like "create card X" without making a second LLM call.
Five new tests verify:
- Non-tool responses only make one LLM call (CompleteAsync never called)
- Tool-calling responses still work correctly
- Degraded orchestrator with null content falls back to CompleteAsync
- Non-board-scoped sessions still use single-turn CompleteAsync
- Reused response still triggers proposal creation when RequestProposal set
Copilot AI review requested due to automatic review settings April 3, 2026 00:00
@Chris0Jeky
Copy link
Copy Markdown
Owner Author

Adversarial Self-Review

Response shape coverage

All orchestrator response shapes are handled:

  • Text only (no tools): Reused via reusableNoToolResponse with local intent classification
  • Tools invoked then text: Unchanged path (usedToolCalling = true)
  • Degraded with content, no tools: Falls through to single-turn CompleteAsync (correct -- orchestrator failure should use fresh single-turn call)
  • Degraded with null content: Falls through to single-turn CompleteAsync (correct)
  • Non-board-scoped session: Orchestrator never invoked; single-turn path used directly

Proposal creation path

Preserved. The local LlmIntentClassifier.Classify runs on the user message to determine IsActionable, and NaturalLanguageInstructionExtractor.Extract provides structured instructions. This is the same fallback all providers (Mock, OpenAI, Gemini) use. Integration test CreateSession_And_SendActionableMessage_ShouldReturnProposalReference passes, confirming end-to-end proposal creation still works.

Known acceptable difference

The reused text content is shaped by the tool-calling system prompt (which describes available tools), while the previously-used CompleteAsync used the board context system prompt. On live providers this means slightly different phrasing for conversational replies. This is acceptable -- the content quality is comparable, and the LLM's answer to "Hello" is reasonable regardless of which system prompt shaped it.

SignalR tool status events

Unaffected. ToolStatusEvent is only emitted inside the orchestrator's tool execution loop, which only runs when tools are actually invoked. The no-tool reuse path correctly skips SignalR events.

Quota tracking

Usage is recorded once per orchestrator call, in the correct branch (either the tool-calling branch or the reuse branch). No double-counting.

No actionable findings

All edge cases are covered. No further changes needed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates ChatService to avoid an unnecessary second LLM provider call when the tool-calling orchestrator returns a plain-text response without invoking any tools, while preserving proposal/actionability detection by running the local intent classifier on the user message.

Changes:

  • Reuse orchestrator text responses (no-tool-call) instead of discarding them and calling CompleteAsync again.
  • Run LlmIntentClassifier + NaturalLanguageInstructionExtractor locally for reused orchestrator responses to keep proposal creation behavior.
  • Add/extend unit tests to ensure CompleteAsync is not called in the reused-response path and that fallback behavior remains correct.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
backend/src/Taskdeck.Application/Services/ChatService.cs Reuses orchestrator content for no-tool responses and runs local intent classification to preserve proposal creation behavior.
backend/tests/Taskdeck.Application.Tests/Services/ChatServiceTests.cs Adds coverage for single-call behavior (no-tool, tool-calls, degraded fallback, non-board sessions, request-proposal).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +239 to +243
else if (!toolResult.IsDegraded && toolResult.Content != null)
{
// The LLM responded with plain text (no tool calls). Reuse
// this content instead of making a redundant CompleteAsync
// call (#672). The response still flows through proposal
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reuse branch treats any non-null orchestrator Content as reusable. ToolCallingChatOrchestrator can emit an empty string when the provider returns IsComplete=true but Content=null, which would now be persisted as a blank assistant message instead of falling back to CompleteAsync. Consider requiring !string.IsNullOrWhiteSpace(toolResult.Content) before reusing; otherwise fall back to single-turn (and/or treat whitespace as degraded).

Copilot uses AI. Check for mistakes.
@@ -1,8 +1,10 @@
using FluentAssertions;
using Microsoft.Extensions.Logging;
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using Microsoft.Extensions.Logging; appears unused in this test file (the logger type is referenced via fully-qualified name). If the solution treats warnings as errors or runs analyzers, this can break builds; please remove the unused using or use the imported namespace consistently.

Suggested change
using Microsoft.Extensions.Logging;

Copilot uses AI. Check for mistakes.
Comment on lines +1125 to +1126
/// Helper to build a ChatService with a tool-calling orchestrator injected.
/// Uses a mock orchestrator that returns the provided ToolCallingResult.
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The XML doc on BuildServiceWithOrchestrator says it "uses a mock orchestrator that returns the provided ToolCallingResult", but the helper actually accepts a real ToolCallingChatOrchestrator instance and just injects it into ChatService. Please update/remove this doc to avoid misleading future test edits.

Suggested change
/// Helper to build a ChatService with a tool-calling orchestrator injected.
/// Uses a mock orchestrator that returns the provided ToolCallingResult.
/// Helper to build a ChatService with the provided tool-calling orchestrator injected.

Copilot uses AI. Check for mistakes.
callSequence++;
if (callSequence == 1)
{
var args = System.Text.Json.JsonDocument.Parse("{}").RootElement;
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JsonDocument.Parse("{}").RootElement creates a JsonDocument that should be disposed. Even in tests, this can leak unmanaged memory and may trigger analyzers. Prefer using var doc = JsonDocument.Parse("{}"); var args = doc.RootElement; (or use JsonSerializer.Deserialize<JsonElement>("{}")).

Suggested change
var args = System.Text.Json.JsonDocument.Parse("{}").RootElement;
var args = System.Text.Json.JsonSerializer.Deserialize<System.Text.Json.JsonElement>("{}");

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a36cfabd80

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +250 to +254
var (classifiedActionable, classifiedIntent) =
LlmIntentClassifier.Classify(dto.Content);
List<string>? classifiedInstructions = null;
if (classifiedActionable)
{
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep LLM instruction extraction for reused no-tool replies

This branch now synthesizes LlmCompletionResult from LlmIntentClassifier + NaturalLanguageInstructionExtractor instead of calling CompleteAsync, which removes the structured instruction-extraction path used by live providers. For board-scoped messages where tool-calling returns plain text (no tool calls), intents like column.reorder are marked actionable but have no extractor support, so Instructions is empty and chat falls back to parsing the raw user sentence with ParseInstructionAsync (strict syntax), causing proposal creation to fail for natural-language requests that previously depended on provider-extracted instructions.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request eliminates redundant LLM calls in ChatService by reusing the plain-text response from the tool-calling orchestrator when no tools are invoked. It introduces logic to perform local intent classification and instruction extraction on these reused responses to maintain proposal creation functionality. Comprehensive unit tests have been added to verify the elimination of double calls across various scenarios, including board-scoped and non-board-scoped sessions, as well as fallback behavior for degraded orchestrator results. I have no feedback to provide as no review comments were submitted.

@Chris0Jeky
Copy link
Copy Markdown
Owner Author

Adversarial Review (Round 2)

Severity: Medium -- Empty-string content reuse creates blank assistant messages

Location: ChatService.cs line 239

else if (!toolResult.IsDegraded && toolResult.Content != null)

The orchestrator coalesces null content to "" on line 125 of ToolCallingChatOrchestrator.cs:

Content: llmResult.Content ?? "",

When the LLM provider returns IsComplete: true with Content: null, the orchestrator produces Content: "". An empty string passes the != null check, so ChatService reuses it and persists a blank assistant message instead of falling back to CompleteAsync.

Fix: Change the guard to !string.IsNullOrWhiteSpace(toolResult.Content).


Severity: Low -- Indentation inconsistency

Lines 288-407 inside the if (!usedToolCalling) block are under-indented by 4 spaces compared to the enclosing scope. The code compiles and runs correctly but violates the project's 4-space indent conventions and makes the control flow harder to follow.

Fix: Re-indent the block contents to the correct nesting level.


Severity: Low -- Redundant fully-qualified name in tests

Lines 1147 and 1250 of ChatServiceTests.cs use Microsoft.Extensions.Logging.ILogger<...> despite having using Microsoft.Extensions.Logging; on line 2. Either drop the using or use the short form ILogger<...>.

Fix: Use the short form ILogger<ToolCallingChatOrchestrator> since the using is already imported.


Severity: Info -- Copilot P2 comment on LLM instruction extraction (not a regression)

Copilot flagged that the reuse path uses the static LlmIntentClassifier instead of the provider's LLM-powered structured instruction extraction. This is intentional and correct -- the whole point of #672 is to eliminate the second LLM call. The static classifier is the same fallback used by all providers (OpenAI, Gemini, Mock) when structured extraction fails. On live providers, this means a minority of natural-language requests that would have been parsed via the LLM's structured response will now go through the static extractor instead, but this trade-off is acceptable and was already acknowledged in the self-review.


Severity: Info -- JsonDocument disposal in test

Line 1224 creates a JsonDocument.Parse("{}") without disposing it. While this is a test and the leak is trivial, using JsonSerializer.Deserialize<JsonElement>("{}") is cleaner. Low priority.


Summary

One actionable bug (empty string reuse), two code quality fixes (indentation, redundant FQN). Addressing now.

The orchestrator coalesces null Content to "" when IsComplete is true.
The previous != null check would reuse this empty string as a blank
assistant message. Use IsNullOrWhiteSpace instead so empty/whitespace
content falls through to CompleteAsync.

Also fix indentation of the !usedToolCalling block to match the
project's 4-space indent conventions.
- Add test verifying empty orchestrator content falls back to
  CompleteAsync instead of persisting a blank assistant message.
- Fix misleading XML doc on BuildServiceWithOrchestrator helper.
- Replace fully-qualified ILogger<> with short form (using already
  imported).
- Replace undisposed JsonDocument.Parse with JsonSerializer.Deserialize.
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@Chris0Jeky Chris0Jeky merged commit 1fb979e into main Apr 3, 2026
23 checks passed
@github-project-automation github-project-automation Bot moved this from Pending to Done in Taskdeck Execution Apr 3, 2026
@Chris0Jeky Chris0Jeky deleted the fix/672-eliminate-double-llm-call branch April 3, 2026 04:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

LLM-06 follow-up: Eliminate double LLM call for non-tool messages

2 participants