-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Description
Summary
The workflow.run Activity (OpenTelemetry span) is created but never stopped/disposed in the .NET in-process workflow runner, so it is never exported to any telemetry backend. The Python implementation correctly emits this span.
Affected Files
dotnet/src/Microsoft.Agents.AI.Workflows/Execution/LockstepRunEventStream.cs(line 53)dotnet/src/Microsoft.Agents.AI.Workflows/Execution/StreamingRunEventStream.cs(line 63) — same pattern, likely affected
Root Cause
The using Activity? activity is declared inside an async IAsyncEnumerable<T> method (TakeEventStreamAsync). When the method exits via yield break (lines 90, 98, 115 in LockstepRunEventStream), the Activity's Dispose() is never invoked — the async iterator cleanup chain does not properly reach the using disposal.
Evidence
1. Raw ActivityListener confirms the Activity starts but never stops
A raw ActivityListener attached to the process shows the workflow.run Activity is created and becomes the parent of child spans, but its ActivityStopped callback never fires:
[RAW START] workflow.run SpanId=b8412143690e1d97
[RAW START] edge_group.process SpanId=d84f67eb9bf2d64d
[RAW STOP] edge_group.process SpanId=d84f67eb9bf2d64d
[RAW START] executor.process SpanId=5f5abcca33196fa5
[RAW STOP] executor.process SpanId=5f5abcca33196fa5
[RAW START] executor.process SpanId=51dfb0444c7de5be
[RAW STOP] executor.process SpanId=51dfb0444c7de5be
Result: !DLROW ,OLLEH
← [RAW STOP] workflow.run NEVER fires
Child spans reference workflow.run as their parent via ParentSpanId, confirming the Activity was created and set as Activity.Current. It is simply never stopped.
2. Cross-implementation telemetry comparison (Application Insights data)
The same sequential workflow (UppercaseExecutor → ReverseTextExecutor, input: "Hello, World!") was run in both .NET and Python. Telemetry was exported to Azure Application Insights and queried via:
union dependencies, requests
| where operation_Id == "<trace_id>"
| project timestamp, name, duration, itemType
| order by timestamp asc.NET In-Process (8 spans):
| Span | Count | Duration |
|---|---|---|
| main | 1 | 140.93ms |
| workflow.build | 1 | 43.92ms |
| edge_group.process | 2 | 6.04ms, 0.2ms |
| executor.process | 2 | 19.57ms, 4.39ms |
| message.send | 2 | 4.42ms, 0.07ms |
| workflow.run | 0 | ❌ Missing |
Python In-Process (7 spans):
| Span | Count | Duration |
|---|---|---|
| Sequential Workflow Scenario (root) | 1 | 68ms |
| workflow.build | 1 | <1ms |
| workflow.run | 1 | 64ms ✅ |
| edge_group.process | 1 | <1ms |
| executor.process | 2 | <1ms each |
| message.send | 1 | <1ms |
Impact
- The
workflow.runspan is the primary span representing end-to-end workflow execution duration. Without it, users cannot measure total workflow execution time in their telemetry. - All other spans (
workflow.build,executor.process,edge_group.process,message.send) are properly exported. - The
ApplicationInsightsobservability sample (dotnet/samples/GettingStarted/Workflows/Observability/ApplicationInsights/) is affected.
Steps to Reproduce
- Run any workflow using
InProcessExecution.RunAsync()(Do a python equivalent as well so you can compare) - Query the telemetry backend data and compare python app's data with .net apps data.
Package Versions
1.0.0-rc1
.NET Version
.NET 10