Fix complex tool result mapping when using Microsoft.Extensions.AI#177
Fix complex tool result mapping when using Microsoft.Extensions.AI#177PederHP wants to merge 1 commit intotghamm:mainfrom
Conversation
|
@stephentoub Could you give this a quick sanity check? (I don't have permissions to assign/request reviewers) |
| else if (frc.Result is JsonElement jsonElement && jsonElement.ValueKind == JsonValueKind.Object) | ||
| { | ||
| // Check if it's a content array structure | ||
| if (jsonElement.TryGetProperty("content", out JsonElement contentArray) && | ||
| contentArray.ValueKind == JsonValueKind.Array) | ||
| { | ||
| foreach (JsonElement item in contentArray.EnumerateArray()) | ||
| { | ||
| if (item.TryGetProperty("type", out JsonElement typeElement)) | ||
| { | ||
| string itemType = typeElement.GetString(); | ||
|
|
||
| if (itemType == "text" && item.TryGetProperty("text", out JsonElement textElement)) | ||
| { | ||
| toolResultContent.Add(new TextContent() { Text = textElement.GetString() ?? string.Empty }); | ||
| } | ||
| else if (itemType == "image" && | ||
| item.TryGetProperty("data", out JsonElement dataElement) && | ||
| item.TryGetProperty("mimeType", out JsonElement mimeTypeElement)) | ||
| { | ||
| toolResultContent.Add(new ImageContent() | ||
| { | ||
| Source = new() | ||
| { | ||
| Data = dataElement.GetString() ?? string.Empty, | ||
| MediaType = mimeTypeElement.GetString() ?? "image/png" | ||
| } | ||
| }); | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
I'm not understanding this. What schema are you parsing here? If this is MCP, after modelcontextprotocol/csharp-sdk#941 shouldn't this be looking for the various AIContent types instead of parsing JSON?
There was a problem hiding this comment.
Yeah, that's a good point. I was basing this on the JSON I found in the untyped object? Result in FunctionResultContent when testing. But you're right it should look for AIContent types, as this library shouldn't care about where the structured tool result content originated.
I think it's somewhat messy that a Result is untyped, as for multi-modal tool results passing the JSON string and the actual API types are not the same (vision doesn't work with the former). So this means that consumers of MEAI function results need to manually handle this mapping.
I suppose as a cleaner way would be to have a way to always get IList<AIContent> ResultContent or similar from FunctionResultContent with MEAI handling the mapping. I'll change the translation here to use proper types instead of this manual JSON parsing, but I think this would be a nice improvement in MEAI to not have to do.
There was a problem hiding this comment.
But wait... "type" and "image" indicate this is not AIContent. I'll try to trace through this and see how the conversions happen, because this does look like the raw MCP tool result content types at second glance.
There was a problem hiding this comment.
Oh... it's because the change you linked to isn't in MCP 0.4.0-preview3, which is what I was testing this with. I'll try with main, then it'll probably be AIContent.
Not sure if this should be backwards compatible with old MCP SDKs. Makes more sense to use AIContent types than this somewhat hacky manual parsing.
There was a problem hiding this comment.
@tghamm Do you prefer this to be compatible with 0.4.0-preview3 and earlier MCP C# SDKs or only later ones (when it will be AIContent). If the former I'll keep the handling of the native MCP format when I add the AIContent support to this PR.
| toolResultContent.Add(new TextContent() { Text = textElement.GetString() ?? string.Empty }); | ||
| } | ||
| else if (itemType == "image" && | ||
| item.TryGetProperty("data", out JsonElement dataElement) && | ||
| item.TryGetProperty("mimeType", out JsonElement mimeTypeElement)) | ||
| { | ||
| toolResultContent.Add(new ImageContent() |
There was a problem hiding this comment.
Do all models support multi-modal tool results?
There was a problem hiding this comment.
Sadly, no, but all Anthropic models do. Last I checked neither OpenAI nor Gemini models had support for image tool results.
When tools return multiple content parts they end up in the
object? Resultproperty of the MEAI tool result content. This wasn't properly mapped back to Anthropic SDK types, causing image tool results to turn into base64 data - which means they're no longer treated as images/vision content by the model and API. It also makes a difference in terms of text content whether it's correctly mapped to API types (additional tokens and the structured format can sometimes confuse the model).I tested this with the
getTinyImageToolfrom theserver-everythingMCP server. It returns [text,image,text] content parts, so it's a good test case to see if the payload is preserved across mapping between MEAI and Anthropic.SDK types.I noticed that IsError is always lost. I will try to fix that in a subsequent PR.