Skip to content

Optional voice activation params for ad4m.ai.open_transcription_stream()#566

Merged
lucksus merged 8 commits intodevfrom
feature/voice-activity-detection-parameters
Feb 17, 2025
Merged

Optional voice activation params for ad4m.ai.open_transcription_stream()#566
lucksus merged 8 commits intodevfrom
feature/voice-activity-detection-parameters

Conversation

@lucksus
Copy link
Member

@lucksus lucksus commented Feb 14, 2025

  1. Added a VoiceActivityParams struct in Rust to hold the parameters:
  • start_threshold: Optional f32
  • start_window: Optional duration in milliseconds
  • end_threshold: Optional f32
  • end_window: Optional duration in milliseconds
  • time_before_speech: Optional duration in milliseconds
  1. Updated the GraphQL schema with a corresponding input type VoiceActivityParamsInput
  2. Modified the open_transcription_stream function to accept these parameters and apply them to the voice activity stream
  3. Updated the TypeScript interfaces and client to support passing these parameters

Now users can customize the voice activity detection behavior when opening a transcription stream. Here's an example of how to use it:

const streamId = await client.ai.openTranscriptionStream(
  "modelId",
  (text) => console.log(text),
  {
    startThreshold: 0.5,      // Sensitivity for detecting start of speech
    startWindow: 100,         // Window size in ms for start detection
    endThreshold: 0.3,        // Sensitivity for detecting end of speech
    endWindow: 500,           // Window size in ms for end detection
    timeBeforeSpeech: 200     // Amount of audio to include before detected speech
  }
);

If no parameters are provided, it will use the default settings with a 500ms end window.
The parameters allow fine-tuning of:

  1. How sensitive the system is to detecting the start/end of speech
  2. How long it waits to confirm speech has started/ended
  3. How much audio before detected speech to include in the transcription

Story: I want to enable the user of the transcription stream to set the paramets that kalosm offers us with the functions in VoiceActivityRechunkerStream. Please all parameters to the call system of that function (fn in AIService, GraphQL interface as defined in AIResolver.ts, client in AIClient.ts, GraphQL implementation in rust-executor).

@lucksus lucksus merged commit ded92d8 into dev Feb 17, 2025
2 checks passed
@lucksus lucksus deleted the feature/voice-activity-detection-parameters branch August 22, 2025 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant