Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow modifying buffer #16

Open
crypdick opened this issue Jun 8, 2018 · 5 comments
Open

Allow modifying buffer #16

crypdick opened this issue Jun 8, 2018 · 5 comments

Comments

@crypdick
Copy link
Contributor

crypdick commented Jun 8, 2018

Right now, you have to wait after a period of silence before the buffer is parsed. It would be great to force the buffer to be parsed (suggested word: "slurp") or to discard the buffer entirely (suggested word: "spit"). Also, popping the last word added to the stack (with "oops" or "scratch").

@dwks
Copy link
Owner

dwks commented Jun 9, 2018

There's not much we can do about having to wait. There's a silence segmentation component inside the server, part of alumae's kaldi-gstreamer-server, which is fairly conservative in determining silence. kaldi has a new speech activity detection model (ASpIRE SAD) which is amazingly good, I will integrate it at some point, but that will take some time.

Discarding the buffer entirely with "spit" already works since the parsing stops if there is any unrecognized word :) but we could add a rule so that the probability of this word gets boosted. I'm not sure how well popping the last word would work, since if I say something invalid, it tends to affect the recognition accuracy of several words in a row -- unless you mean to cancel an otherwise valid command?

I think it would be cool to have a command that recalls some of the previous command -- either until the first parse error, or until a specific word. e.g. "charlie delta space oops foo" ... "from space slash slap" -> "cd /". Thoughts? I guess it's helpful to see the parse results in this case. Do you usually have the Silvius output window visible e.g. on a second monitor? Or perhaps we could add a little bit of X11 integration so that the output would be visible.

@dwks
Copy link
Owner

dwks commented Jun 9, 2018

Or "charlie delta spell slash slap", "fix word spell space" to fix just one incorrect word...

@crypdick
Copy link
Contributor Author

I didn't know that there was a recognition was temporally dependent like that. I often peek at the buffer when its having trouble recognizing a word, but I really like the idea of having the buffer visualized (I'm imagining something like https://github.com/wavexx/screenkey). Finally, I also like the idea of deleting back until a certain spot

@crypdick
Copy link
Contributor Author

Sorry if this is beating a dead horse: so there's a silence detector in the server, but would it be possible client-side to manually inject the END signal and parse the buffer as-is?

@dwks
Copy link
Owner

dwks commented Jun 14, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants