@@ -124,23 +124,29 @@ await decoder.initialize();
124124
125125The optional ` loglevel ` and ` backtrace ` options will make it a bit
126126more verbose, so you can be sure it's actually doing something. Now
127- we will create the world's stupidest grammar, which recognizes one
128- sentence:
127+ we will create and enable the world's stupidest grammar, which
128+ recognizes one sentence:
129129
130130``` js
131- let fsg = decoder .create_fsg (" goforward" , 0 , 4 , [
131+ await decoder .set_fsg (" goforward" , 0 , 4 , [
132132 {from: 0 , to: 1 , prob: 1.0 , word: " go" },
133133 {from: 1 , to: 2 , prob: 1.0 , word: " forward" },
134134 {from: 2 , to: 3 , prob: 1.0 , word: " ten" },
135135 {from: 3 , to: 4 , prob: 1.0 , word: " meters" }
136136]);
137- await decoder .set_fsg (fsg);
138137```
139138
140- You should ` delete() ` it, unless of course you intend to create a
141- bunch of them and swap them in and out. It is also possible to parse
142- a grammar in [ JSGF] ( https://en.wikipedia.org/wiki/JSGF ) format, see
143- below for an example.
139+ If you actually want to just recognize a single sentence, in order to
140+ get time alignments (this is known as "force-alignment"), we have a
141+ better method for you:
142+
143+ ``` js
144+ await decoder .set_align_text (" go forward ten meters" );
145+ ```
146+
147+ It is also possible to parse a grammar in
148+ [ JSGF] ( https://en.wikipedia.org/wiki/JSGF ) format, see below for an
149+ example.
144150
145151Okay, let's wreck a nice beach! Record yourself saying something,
146152preferably the sentence "go forward ten meters", using SoX, for
@@ -171,6 +177,23 @@ console.log(decoder.get_hyp());
171177console .log (decoder .get_hypseg ());
172178```
173179
180+ If you want even more detailed segmentation (phone and HMM state
181+ level) you can use ` get_alignment_json ` . For more detail on this
182+ format, see [ the PocketSphinx
183+ documentation] ( https://github.com/cmusphinx/pocketsphinx#usage ) as it
184+ is borrowed from there. Since this is JSON, you can create an object
185+ from it and iterate over it:
186+
187+ ``` js
188+ const result = JSON .parse (await decoder .get_alignment_json ());
189+ for (const word of result .w ) {
190+ console .log (` word ${ word .t } at ${ word .b } has duration ${ word .d } ` );
191+ for (const phone of word .w ) {
192+ console .log (` phone ${ phone .t } at ${ phone .b } has duration ${ phone .d } ` );
193+ }
194+ }
195+ ```
196+
174197Finally, if your program is long-running and you think you might make
175198multiple recognizers, you ought to delete them, because JavaScript is
176199awful:
@@ -210,18 +233,6 @@ await require('soundswallower')(ssjs);
210233This is simply concatenated to the model name, so you should make sure
211234to include the trailing slash, e.g. "model/" and not "model"!
212235
213- Currently, it should also support any Sphinx format acoustic model, many of
214- which are available for download at [ the SourceForge
215- page] ( https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/ ) .
216-
217- To use a module, pass the directory (or base URL) containing its files
218- (i.e. ` means ` , ` variances ` , etc) in the ` hmm ` property when
219- initializing the decoder, for example:
220-
221- ``` js
222- const decoder = ssjs .Decoder ({hmm: " https://example.com/excellent-acoustic-model/" });
223- ```
224-
225236
226237Using grammars
227238--------------
@@ -231,7 +242,7 @@ from a JavaScript string and set it in the decoder like this (a
231242hypothetical pizza-ordering grammar):
232243
233244``` js
234- let fsg = decoder .parse_jsgf (` #JSGF V1.0;
245+ await decoder .set_jsgf (` #JSGF V1.0;
235246grammar pizza;
236247public <order> = [<greeting>] [<want>] [<quantity>] [<size>] [pizza] <toppings>;
237248<greeting> = hi | hello | yo | howdy;
@@ -241,7 +252,6 @@ public <order> = [<greeting>] [<want>] [<quantity>] [<size>] [pizza] <toppings>;
241252<toppings> = [with] <topping> ([and] <topping>)*;
242253<topping> = olives | mushrooms | tomatoes | (green | hot) peppers | pineapple;
243254` );
244- await decoder .set_fsg (fsg);
245255```
246256
247257Note that all the words in the grammar must first be defined in the
@@ -257,3 +267,32 @@ the internal state.
257267 await decoder .add_word (" supercalifragilisticexpialidocious" ,
258268 " S UW P ER K AE L IH F R AE JH IH L IH S T IH K EH K S P IY AE L IH D OW SH Y UH S" );
259269```
270+
271+ Voice activity detection / Endpointing
272+ --------------------------------------
273+
274+ This is a work in progress, but it is also possible to detect the
275+ start and end of speech in an input stream using an ` Endpointer `
276+ object. This requires you to pass buffers of a specific size, which
277+ is understandably difficult since WebAudio also only wants to * give*
278+ you buffers of a specific (and entirely different) size. A better
279+ example is forthcoming but it looks a bit like this (copied directly
280+ from [ the
281+ documentation] ( https://soundswallower.readthedocs.io/en/latest/soundswallower.js.html#Endpointer.get_in_speech ) :
282+
283+ ``` js
284+ let prev_in_speech = ep .get_in_speech ();
285+ let frame_size = ep .get_frame_size ();
286+ // Presume `frame` is a Float32Array of frame_size or less
287+ let speech;
288+ if (frame .size < frame_size)
289+ speech = ep .end_stream (frame);
290+ else
291+ speech = ep .process (frame);
292+ if (speech !== null ) {
293+ if (! prev_in_speech)
294+ console .log (" Speech started at " + ep .get_speech_start ());
295+ if (! ep .get_in_speech ())
296+ console .log (" Speech ended at " + ep .get_speech_end ());
297+ }
298+ ```
0 commit comments