-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow modifying buffer #16
Comments
There's not much we can do about having to wait. There's a silence segmentation component inside the server, part of alumae's kaldi-gstreamer-server, which is fairly conservative in determining silence. kaldi has a new speech activity detection model (ASpIRE SAD) which is amazingly good, I will integrate it at some point, but that will take some time. Discarding the buffer entirely with "spit" already works since the parsing stops if there is any unrecognized word :) but we could add a rule so that the probability of this word gets boosted. I'm not sure how well popping the last word would work, since if I say something invalid, it tends to affect the recognition accuracy of several words in a row -- unless you mean to cancel an otherwise valid command? I think it would be cool to have a command that recalls some of the previous command -- either until the first parse error, or until a specific word. e.g. "charlie delta space oops foo" ... "from space slash slap" -> "cd /". Thoughts? I guess it's helpful to see the parse results in this case. Do you usually have the Silvius output window visible e.g. on a second monitor? Or perhaps we could add a little bit of X11 integration so that the output would be visible. |
Or "charlie delta spell slash slap", "fix word spell space" to fix just one incorrect word... |
I didn't know that there was a recognition was temporally dependent like that. I often peek at the buffer when its having trouble recognizing a word, but I really like the idea of having the buffer visualized (I'm imagining something like https://github.com/wavexx/screenkey). Finally, I also like the idea of deleting back until a certain spot |
Sorry if this is beating a dead horse: so there's a silence detector in the server, but would it be possible client-side to manually inject the |
Interesting idea. So you want to be able to say "Delta left execute echo
left execute..." and execute immediately once the "execute" word is
decoded? That's quite possible to implement, the reason I never considered
this is that most speech recognition systems have much higher accuracy in
their final hypothesis, the intermediate hypothesis can contain a lot of
errors. Also, the current decoder take slightly more CPU the longer an
input phrase is, and this setup might encourage you to just keep speaking
(and eventually lag a bunch until you pause).
You can certainly try this, although I think the better long-term solution
is to have a much faster silence detection mechanism. I believe the current
one just looks for a lack of sound waves, but I recently tried an online
aspire speech activity detection model which is absolutely incredible at
detecting when you've stopped speaking, because it understands the actual
phonemes that you're speaking. For the best possible accuracy and latency
at the moment, we should just use an aspire SAD model and then an nnet3
aspire speech model -- ideally with boosted command words but even without
it should perform well. Let me know if you have any bandwidth for this :)
…On Wed, Jun 13, 2018, 10:15 PM Richard Decal ***@***.***> wrote:
Sorry if this is beating a dead horse: so there's a silence detector in
the server, but would it be possible client-side to manually inject the
END signal and parse the buffer as-is?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#16 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AArjF3XoZtMPYjUW0BrwCHbcL6Vqx_ojks5t8cdegaJpZM4UhCjG>
.
|
Right now, you have to wait after a period of silence before the buffer is parsed. It would be great to force the buffer to be parsed (suggested word: "slurp") or to discard the buffer entirely (suggested word: "spit"). Also, popping the last word added to the stack (with "oops" or "scratch").
The text was updated successfully, but these errors were encountered: