Skip to content

feat: Add Bedrock Nova Sonic realtime provider implementing IRealtimeClient/IRealtimeClientSession#4373

Open
tarekgh wants to merge 10 commits intoaws:developmentfrom
tarekgh:bedrock-nova-realtime-provider
Open

feat: Add Bedrock Nova Sonic realtime provider implementing IRealtimeClient/IRealtimeClientSession#4373
tarekgh wants to merge 10 commits intoaws:developmentfrom
tarekgh:bedrock-nova-realtime-provider

Conversation

@tarekgh
Copy link
Copy Markdown

@tarekgh tarekgh commented Mar 31, 2026

Summary

Adds a Bedrock Nova Sonic provider implementing the Microsoft.Extensions.AI Realtime abstractions (IRealtimeClient / IRealtimeClientSession), enabling real-time bidirectional audio conversations with AWS Bedrock Nova Sonic models through the standardized MEAI interface.

This PR updates the Microsoft.Extensions.AI.Abstractions dependency from 9.9.1 to 10.4.1 for the realtime types.

What's Included

New Files

  • BedrockNovaRealtimeClient.cs -- IRealtimeClient implementation that wraps an IAmazonBedrockRuntime and creates realtime sessions via the Nova Sonic bidirectional streaming API. Includes a convenience constructor accepting access key, secret key, region, and optional model ID.
  • BedrockNovaRealtimeSession.cs -- IRealtimeClientSession implementation that manages the bidirectional event stream, audio buffering, Nova Sonic protocol state machine, and function call orchestration (~1,540 lines).
  • BedrockRealtimeClientTests.cs -- Client-level unit tests (construction, disposal, metadata, service resolution).
  • BedrockRealtimeSessionTests.cs -- Session-level unit tests covering the full protocol surface area.

Modified Files

  • AmazonBedrockRuntimeExtensions.cs -- Added AsIRealtimeClient() extension method.
  • AWSSDK.Extensions.Bedrock.MEAI.NetStandard.csproj -- Updated Microsoft.Extensions.AI.Abstractions dependency from 9.9.1 to 10.4.1.

Features

  • Bidirectional audio streaming -- Full-duplex audio via Nova Sonic's InvokeModelWithBidirectionalStreamAsync with BodyPublisher pattern
  • Voice Activity Detection (VAD) -- Nova Sonic uses built-in VAD; audio content stays open for the session with trailing silence for end-of-speech detection
  • Text conversations -- Send text messages via CreateConversationItem with automatic content block management
  • Function calling -- Full tool invocation support with inline invocation, priority queue for tool results, and proper Nova Sonic tool configuration protocol
  • Input/output transcription -- Maps Nova Sonic textOutput events to InputAudioTranscriptionCompleted and OutputAudioTranscriptionDelta messages
  • Thread-safe sends -- SemaphoreSlim serializes all outbound writes; priority queue bypasses normal channel for time-sensitive tool results
  • Graceful disposal -- Race-safe dispose with proper protocol shutdown sequence (contentEnd -> promptEnd -> sessionEnd), semaphore-guarded cleanup
  • Convenience constructor -- BedrockNovaRealtimeClient(string accessKeyId, string secretAccessKey, string regionName, string? defaultModelId) for simple setup
  • Owned runtime disposal -- Convenience constructor tracks ownership and properly disposes its internally-created AmazonBedrockRuntimeClient
  • NetStandard 2.0 compatible -- Targets NetStandard 2.0 with polyfills for [Experimental] attribute

Usage Example

using Amazon.BedrockRuntime;
using Microsoft.Extensions.AI;

// Simple: create with credentials and region
IRealtimeClient realtimeClient = new BedrockNovaRealtimeClient(
    accessKeyId, secretAccessKey, "us-east-1", "amazon.nova-2-sonic-v1:0");

// Or advanced: create from an existing IAmazonBedrockRuntime instance
// var runtime = new AmazonBedrockRuntimeClient(RegionEndpoint.USEast1);
// IRealtimeClient realtimeClient = new BedrockNovaRealtimeClient(runtime);

// Or use the extension method
// IRealtimeClient realtimeClient = runtime.AsIRealtimeClient();

// Define a tool for function calling
AIFunction getWeather = AIFunctionFactory.Create(
    (string location) =>
        location.ToLowerInvariant() switch
        {
            var l when l.Contains("seattle")       => $"The weather in {location} is rainy, 55F",
            var l when l.Contains("new york")      => $"The weather in {location} is cloudy, 70F",
            var l when l.Contains("san francisco") => $"The weather in {location} is foggy, 60F",
            _                                      => $"Sorry, I don't have weather data for {location}."
        },
    "GetWeather",
    "Gets the current weather for a given location");

// Configure session options
var sessionOptions = new RealtimeSessionOptions
{
    Instructions = "You are a helpful assistant.",
    Voice = "tiffany",
    Tools = [getWeather],
    InputAudioFormat = new RealtimeAudioFormat("audio/lpcm", 16000),
    OutputAudioFormat = new RealtimeAudioFormat("audio/lpcm", 24000),
};

// Create a session and start streaming
await using var session = await realtimeClient.CreateSessionAsync(sessionOptions);

// Start listening for server messages in the background
_ = Task.Run(async () =>
{
    await foreach (var message in session.GetStreamingResponseAsync(cancellationToken))
    {
        switch (message)
        {
            case OutputTextAudioRealtimeServerMessage audio
                when audio.Type == RealtimeServerMessageType.OutputAudioDelta:
                PlayAudio(audio.Audio);
                break;

            case OutputTextAudioRealtimeServerMessage text
                when text.Type == RealtimeServerMessageType.OutputTextDelta:
                Console.Write(text.Text);
                break;

            case InputAudioTranscriptionRealtimeServerMessage transcription:
                Console.WriteLine($"You said: {transcription.Transcription}");
                break;
        }
    }
});

// Send audio (e.g., from microphone)
var audioContent = new DataContent(audioBytes, "audio/lpcm");
await session.SendAsync(new InputAudioBufferAppendRealtimeClientMessage(audioContent));
await session.SendAsync(new InputAudioBufferCommitRealtimeClientMessage());

Key Design Decisions

  1. BodyPublisher while(true) loop -- Nova Sonic uses a BodyPublisher delegate that returns events one at a time. The provider uses a persistent loop that waits on both a normal Channel<T> for audio events and a ConcurrentQueue for priority tool results. This avoids closing the outbound stream prematurely while ensuring tool results bypass queued audio.

  2. Priority queue for tool results -- Tool results are sent via WritePriorityEvent which enqueues to a ConcurrentQueue and signals the BodyPublisher via TaskCompletionSource. This ensures tool results reach Nova Sonic before it commits to a speculative fallback response, which is critical for reliable function calling.

  3. Inline tool invocation -- When FunctionInvokingRealtimeSession middleware sends CreateConversationItem with tool results, the provider sends them inline via the priority queue with proper Nova Sonic protocol framing (contentStart -> toolResult -> contentEnd -> silence nudge).

  4. Audio content stays open -- Following the official AWS Nova Sonic samples, the audio content block is never explicitly closed mid-conversation. Instead, trailing silence is sent on InputAudioBufferCommit to help VAD detect end-of-speech. This matches how the Nova Sonic service expects continuous audio streams.

  5. Prompt lifecycle management -- The provider automatically manages promptStart/promptEnd lifecycle. After a tool result cycle, a new prompt is opened with a fresh promptName for the next turn, maintaining correct protocol state.

  6. BodyPublisher shutdown safety -- The while(true) loop checks for cancellation token and channel completion to avoid CPU-burning spin loops during disposal. All exit paths dispose the enumerator.

  7. Thread safety -- All outbound writes go through a SemaphoreSlim. Fields mutated under the semaphore (_promptName, _audioContentName) are captured to local variables when read outside the semaphore (e.g., in SendToolResultInline).

Test Coverage

54 unit tests covering:

  • Client lifecycle (construction, convenience constructor, disposal, owned runtime disposal)
  • Session lifecycle (connect, dispose, idempotent dispose, service resolution)
  • Protocol events (sessionStart, promptStart, contentStart, audioInput, contentEnd, promptEnd, sessionEnd)
  • Audio streaming (append, commit with trailing silence, CreateResponse silence nudge)
  • Text content (send text via CreateConversationItem)
  • Function calling (tool configuration in promptStart, tool result protocol, inline tool result events)
  • Error handling (dispose during active session, concurrent operations, invalid state)
  • Message mapping (all server message types, usage details, completion events)
  • Edge cases (null options, empty audio, multiple tool calls)

@tarekgh tarekgh force-pushed the bedrock-nova-realtime-provider branch 2 times, most recently from a8e52e9 to d8dfcfe Compare March 31, 2026 00:41
…eClientSession

Implements the MEAI IRealtimeClient and IRealtimeClientSession abstractions
for AWS Bedrock Nova Sonic, enabling real-time bidirectional audio
conversations with tool calling support.

Key features:
- Full bidirectional audio streaming via Nova Sonic protocol
- VAD-driven speech detection with trailing silence for end-of-speech
- Tool calling with inline invocation and priority queue for tool results
- Convenience constructor with proper _ownsRuntime disposal
- Thread-safe session state management with semaphore synchronization

Reliability:
- Priority queue bypasses queued audio for time-sensitive tool results
- BodyPublisher while(true) loop with proper shutdown (no spin loops)
- IAsyncEnumerator disposal on all exit paths
- SendToolResultInline captures fields to local variables for thread safety

Tests: 56 unit tests covering session lifecycle, audio streaming, tool
calling, protocol events, error handling, and edge cases.
@tarekgh tarekgh force-pushed the bedrock-nova-realtime-provider branch from d8dfcfe to a7cba1f Compare March 31, 2026 00:47
@dscpinheiro dscpinheiro changed the base branch from main to development March 31, 2026 02:55
@normj normj requested review from Copilot and normj April 1, 2026 20:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an Amazon Bedrock Nova Sonic realtime provider that implements the Microsoft.Extensions.AI realtime abstractions (IRealtimeClient / IRealtimeClientSession) to enable bidirectional audio conversations over Bedrock’s bidirectional streaming API.

Changes:

  • Introduces BedrockNovaRealtimeClient + BedrockNovaRealtimeSession implementations (net8-only via #if NET8_0_OR_GREATER).
  • Adds an AsIRealtimeClient() extension on IAmazonBedrockRuntime.
  • Updates Microsoft.Extensions.AI.Abstractions to 10.4.1 and adds a new net8 test project covering client/session behavior.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
extensions/src/AWSSDK.Extensions.Bedrock.MEAI/BedrockNovaRealtimeClient.cs New IRealtimeClient implementation + convenience ctor for credentials/region.
extensions/src/AWSSDK.Extensions.Bedrock.MEAI/BedrockNovaRealtimeSession.cs New IRealtimeClientSession implementation handling bidirectional streaming protocol, buffering, tool orchestration, and disposal.
extensions/src/AWSSDK.Extensions.Bedrock.MEAI/AmazonBedrockRuntimeExtensions.cs Adds AsIRealtimeClient() extension (net8-only).
extensions/src/AWSSDK.Extensions.Bedrock.MEAI/AWSSDK.Extensions.Bedrock.MEAI.NetStandard.csproj Bumps Microsoft.Extensions.AI.Abstractions to 10.4.1 and updates warning suppression.
extensions/test/BedrockMEAIRealtimeTests/BedrockMEAIRealtimeTests.csproj New net8 test project for realtime functionality.
extensions/test/BedrockMEAIRealtimeTests/BedrockRealtimeClientTests.cs New unit tests for realtime client construction, service resolution, model selection, and extension method.
extensions/test/BedrockMEAIRealtimeTests/BedrockRealtimeSessionTests.cs New unit tests for session send behavior, protocol event formatting, and concurrency/disposal behavior.

Comment thread extensions/test/BedrockMEAIRealtimeTests/BedrockRealtimeClientTests.cs Outdated
Comment thread extensions/src/AWSSDK.Extensions.Bedrock.MEAI/BedrockNovaRealtimeClient.cs Outdated
Comment thread extensions/src/AWSSDK.Extensions.Bedrock.MEAI/BedrockNovaRealtimeClient.cs Outdated
Comment thread extensions/src/AWSSDK.Extensions.Bedrock.MEAI/AmazonBedrockRuntimeExtensions.cs Outdated
tarekgh and others added 5 commits April 1, 2026 14:01
- Fix model ID in XML doc comments (nova-sonic-v1:0 -> nova-2-sonic-v1:0)
- Remove unused using directives (System.IO, Amazon.BedrockRuntime.Model)
- Add DevConfig file for release automation
Replace all 31 Task.Delay calls with SpinWait.SpinUntil-based polling
helpers (WaitForEvents/WaitForEvent) for CI-resilient tests. Test
duration dropped from ~6s to ~650ms.
Replace anonymous type serialization with concrete DTO classes and
System.Text.Json source generation (NovaSonicJsonContext). This eliminates
most IL2026 warnings without runtime reflection for protocol events.

- Add ~20 concrete DTO classes for Nova Sonic protocol messages
- Add NovaSonicJsonContext with [JsonSerializable] for compile-time codegen
- Remove IL2026 from project-wide NoWarn (only 7 targeted pragmas remain)
- Keep reflection path only for dynamic tool result serialization
Copy link
Copy Markdown
Member

@normj normj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a cool feature. I took a first pass. After you make the changes I'll try it out in action.

Comment thread extensions/src/AWSSDK.Extensions.Bedrock.MEAI/AmazonBedrockRuntimeExtensions.cs Outdated
Comment thread extensions/src/AWSSDK.Extensions.Bedrock.MEAI/BedrockNovaRealtimeClient.cs Outdated
Comment thread extensions/src/AWSSDK.Extensions.Bedrock.MEAI/BedrockNovaRealtimeClient.cs Outdated
Comment thread extensions/src/AWSSDK.Extensions.Bedrock.MEAI/BedrockNovaRealtimeClient.cs Outdated
Comment thread extensions/src/AWSSDK.Extensions.Bedrock.MEAI/BedrockNovaRealtimeClient.cs Outdated
Comment thread extensions/src/AWSSDK.Extensions.Bedrock.MEAI/BedrockNovaRealtimeSession.cs Outdated
Comment thread extensions/src/AWSSDK.Extensions.Bedrock.MEAI/BedrockNovaRealtimeSession.cs Outdated
Comment thread extensions/src/AWSSDK.Extensions.Bedrock.MEAI/BedrockNovaRealtimeSession.cs Outdated
Comment thread generator/.DevConfigs/aec9fb2f-be53-4fc5-8ac3-ec2e7395e087.json Outdated
@tarekgh
Copy link
Copy Markdown
Author

tarekgh commented Apr 2, 2026

Looks like a cool feature. I took a first pass. After you make the changes I'll try it out in action.

Thanks @normj for helping with the review. Just to let you know, I have a demo test app I am using to test with different providers. https://github.com/tarekgh/RealtimeProposalDemoApp.

tarekgh added 2 commits April 2, 2026 13:23
- Remove default model ID; require explicit model via constructor or session options
- Remove convenience constructor (accessKeyId/secretKey); follow BedrockChatClient pattern
- Throw InvalidOperationException if no model ID resolves in CreateSessionAsync
- Remove unnecessary dummy event handler registrations
- Use Dictionary<string, JsonElement> for tool arguments (AOT-safe)
- Rewrite SerializeToolResult without reflection (zero IL2026 pragmas remain)
- Add (Preview) prefix to DevConfig changelog message
… tool normalization

- Fix SendAsync error handling: rethrow ODE as named ObjectDisposedException,
  swallow ChannelClosedException/OCE only when disposed (not blanket catch)
- Add concurrent enumeration guard (_activeStreamingEnumeration) to
  GetStreamingResponseAsync to prevent multiple simultaneous readers
- Wrap DisposeAsync resources in individual try/catch with
  ExceptionDispatchInfo to prevent resource leaks on partial failure
- Replace shallow JsonElementToDictionary with deep NormalizeToolPayload,
  NormalizeToolArguments, ConvertJsonElementToToolPayload for tool results
- Use FunctionCallContent.CreateFromParsedArguments for tool call args
  (consistent with MEAI conventions, AOT-safe)
- Add MaxToolPayloadDepth (64) depth guard to prevent stack overflow
- Align disposed/cancellation check order with GenAI/VertexAI providers
- Replace ThrowIf(this) with manual if+nameof() for consistent ODE naming
- Add InternalsVisibleTo for test project access to normalization methods
- Add 8 regression tests for new behaviors
@tarekgh
Copy link
Copy Markdown
Author

tarekgh commented Apr 10, 2026

Hi @normj, just wanted to see if you’ve had a chance to test out that provider yet? I’m looking to wrap up this review and would love to get your take.

@normj
Copy link
Copy Markdown
Member

normj commented Apr 10, 2026

@tarekgh it is on my list to hopefully get to soon.

normj added 2 commits April 11, 2026 17:56
1) Fix some formatting
2) Remove private method that wasn't being used
3) Add test project to solution file
Copy link
Copy Markdown
Member

@normj normj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rebased the branch on the latest development changes and pushed a commit with a couple very minor nits. PR looks good and I did try out the sample app which was cool.

@dscpinheiro Can you do a second pass on the PR?

We are also working through some infrastructure issues that get triggered by updating the version of Microsoft.Extensions.AI.Abstractions which pulls in a newer version of System.Text.Json in some of our other process. The change to update Microsoft.Extensions.AI.Abstractions is fine but there are assumptions in other parts of our build system for packaging up the SDK for non NuGet users that go awry. Bare with us even when we approve the PR it might be a bit before we can merge it till we sort out the issue.

@normj normj requested a review from dscpinheiro April 12, 2026 01:06
@tarekgh
Copy link
Copy Markdown
Author

tarekgh commented Apr 12, 2026

Thanks @normj!

Take your time. I want to ensure the changes is not going to cause any issue in general.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants