Optional voice activation params for `ad4m.ai.open_transcription_stream()` by lucksus · Pull Request #566 · coasys/ad4m

lucksus · 2025-02-14T23:17:57Z

Added a VoiceActivityParams struct in Rust to hold the parameters:

start_threshold: Optional f32
start_window: Optional duration in milliseconds
end_threshold: Optional f32
end_window: Optional duration in milliseconds
time_before_speech: Optional duration in milliseconds

Updated the GraphQL schema with a corresponding input type VoiceActivityParamsInput
Modified the open_transcription_stream function to accept these parameters and apply them to the voice activity stream
Updated the TypeScript interfaces and client to support passing these parameters

Now users can customize the voice activity detection behavior when opening a transcription stream. Here's an example of how to use it:

const streamId = await client.ai.openTranscriptionStream(
  "modelId",
  (text) => console.log(text),
  {
    startThreshold: 0.5,      // Sensitivity for detecting start of speech
    startWindow: 100,         // Window size in ms for start detection
    endThreshold: 0.3,        // Sensitivity for detecting end of speech
    endWindow: 500,           // Window size in ms for end detection
    timeBeforeSpeech: 200     // Amount of audio to include before detected speech
  }
);

If no parameters are provided, it will use the default settings with a 500ms end window.
The parameters allow fine-tuning of:

How sensitive the system is to detecting the start/end of speech
How long it waits to confirm speech has started/ended
How much audio before detected speech to include in the transcription

Story: I want to enable the user of the transcription stream to set the paramets that kalosm offers us with the functions in VoiceActivityRechunkerStream. Please all parameters to the call system of that function (fn in AIService, GraphQL interface as defined in AIResolver.ts, client in AIClient.ts, GraphQL implementation in rust-executor).

lucksus added 8 commits February 15, 2025 00:06

Optional voice activation params for open_transcription_stream

5a80788

fmt

14c4871

Use params in test

5ca7adc

Fix GraphQL type error

0bf2f24

Assert word level split

1cc83bd

Refactor transcription tests, extract functions

54ccf79

Adjust params for quicker word detection

092abec

changelog

995e750

lucksus merged commit ded92d8 into dev Feb 17, 2025
2 checks passed

lucksus deleted the feature/voice-activity-detection-parameters branch August 22, 2025 12:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optional voice activation params for `ad4m.ai.open_transcription_stream()`#566

Optional voice activation params for `ad4m.ai.open_transcription_stream()`#566
lucksus merged 8 commits intodevfrom
feature/voice-activity-detection-parameters

lucksus commented Feb 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lucksus commented Feb 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant