Skip to content

.NET: [Bug]: workflow.run OpenTelemetry span is never exported in .NET in-process workflow execution #4155

@kshyju

Description

@kshyju

Description

Summary

The workflow.run Activity (OpenTelemetry span) is created but never stopped/disposed in the .NET in-process workflow runner, so it is never exported to any telemetry backend. The Python implementation correctly emits this span.

Affected Files

  • dotnet/src/Microsoft.Agents.AI.Workflows/Execution/LockstepRunEventStream.cs (line 53)
  • dotnet/src/Microsoft.Agents.AI.Workflows/Execution/StreamingRunEventStream.cs (line 63) — same pattern, likely affected

Root Cause

The using Activity? activity is declared inside an async IAsyncEnumerable<T> method (TakeEventStreamAsync). When the method exits via yield break (lines 90, 98, 115 in LockstepRunEventStream), the Activity's Dispose() is never invoked — the async iterator cleanup chain does not properly reach the using disposal.

Evidence

1. Raw ActivityListener confirms the Activity starts but never stops

A raw ActivityListener attached to the process shows the workflow.run Activity is created and becomes the parent of child spans, but its ActivityStopped callback never fires:

[RAW START] workflow.run SpanId=b8412143690e1d97
[RAW START] edge_group.process SpanId=d84f67eb9bf2d64d
[RAW STOP]  edge_group.process SpanId=d84f67eb9bf2d64d
[RAW START] executor.process SpanId=5f5abcca33196fa5
[RAW STOP]  executor.process SpanId=5f5abcca33196fa5
[RAW START] executor.process SpanId=51dfb0444c7de5be
[RAW STOP]  executor.process SpanId=51dfb0444c7de5be
Result: !DLROW ,OLLEH
                         ← [RAW STOP] workflow.run NEVER fires

Child spans reference workflow.run as their parent via ParentSpanId, confirming the Activity was created and set as Activity.Current. It is simply never stopped.

2. Cross-implementation telemetry comparison (Application Insights data)

The same sequential workflow (UppercaseExecutor → ReverseTextExecutor, input: "Hello, World!") was run in both .NET and Python. Telemetry was exported to Azure Application Insights and queried via:

union dependencies, requests
| where operation_Id == "<trace_id>"
| project timestamp, name, duration, itemType
| order by timestamp asc

.NET In-Process (8 spans):

Span Count Duration
main 1 140.93ms
workflow.build 1 43.92ms
edge_group.process 2 6.04ms, 0.2ms
executor.process 2 19.57ms, 4.39ms
message.send 2 4.42ms, 0.07ms
workflow.run 0 ❌ Missing

Python In-Process (7 spans):

Span Count Duration
Sequential Workflow Scenario (root) 1 68ms
workflow.build 1 <1ms
workflow.run 1 64ms ✅
edge_group.process 1 <1ms
executor.process 2 <1ms each
message.send 1 <1ms

Impact

  • The workflow.run span is the primary span representing end-to-end workflow execution duration. Without it, users cannot measure total workflow execution time in their telemetry.
  • All other spans (workflow.build, executor.process, edge_group.process, message.send) are properly exported.
  • The ApplicationInsights observability sample (dotnet/samples/GettingStarted/Workflows/Observability/ApplicationInsights/) is affected.

Steps to Reproduce

  1. Run any workflow using InProcessExecution.RunAsync() (Do a python equivalent as well so you can compare)
  2. Query the telemetry backend data and compare python app's data with .net apps data.

Package Versions

1.0.0-rc1

.NET Version

.NET 10

Metadata

Metadata

Assignees

Labels

.NETbugSomething isn't workingworkflowsRelated to Workflows in agent-framework

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions