-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the option for using other speech to text programs #7
Comments
The reason it is so difficult to switch to other speech recognition engines is because DSNs need more than simple "speech recognition". Most "speech recognition" engines really only provide one service: speech-to-text. This is useless for DSN, because what DSN needs is to find the one you say from a sequence of command words. Just converting what you say into text won't do it, because the speech recognition engine will definitely take some words sound like other words, causing the text to not match exactly. However, DSN itself does not have the ability to process natural language, so it cannot compare which word is the closest to the recognition result. Therefore, DSN needs a speech recognition engine with command word recognition function: I put some words in and it tells me which one you said. I've tried to find it, but engines that meet the requirements and support the language I speak (Chinese) are almost non-existent (except Microsoft's). Therefore, the operation to replace the engine has not started. |
I will try again to find a suitable engine. I'll update if I find anything new. |
Two problems with speech-to-text:
A typical program can only compare two literals for equality or inequality. It is already difficult to compare whether two texts are similar or not. However, the task now is to compare the similarities and dissimilarities of the pronunciation of two texts, and the word spell and number of words in both texts may be different! It's no easier than developing a speech engine. In fact, such a task should itself be handled within the speech engine: I tell the speech engine what text you might say, and the speech engine matches and tells me which one you said. Then, I can make corresponding choices in the game. This is also how I use Microsoft's speech recognition engine. However, few other engines offer similar usage. Most of them only provide "text-to-speech", and only a small number of thesaurus settings, not enough to answer the question "Which one of a series of text did you say". |
What about a wake word/phrase detection engine? Those are very reliable and only detect what words you tell it to. |
Using something like https://github.com/Picovoice/rhino let's you do thing by intent, which would be interesting. You could say "please give me the fire spell" and it would recognize equip fire. Alternativly you could use https://github.com/Picovoice/porcupine to detect only the words and phrases you want. |
It looks OK. But I doubt the amount of text and words it can detect at the same time. In dialogue recognition, we often have 4 to 5 long sentences. In command recognition, it may be dozens of short words. Neither of these two cases quite fit the design purpose of "wake word detection". |
Both are promising and I'll look into them. I want them to support Chinese so I can actually use it. If they don't support it, it will be a blow to me - I will develop a feature that I can't actually use. |
https://github.com/Picovoice/rhino#language-support
No Chinese. Sad. |
https://github.com/daanzu/kaldi-active-grammar is probably the closest speech recognition engine to my needs - it allows people to create new language models for it. However my attempts to create a Chinese language model were unsuccessful: maybe i should try again. |
https://github.com/daanzu/kaldi-active-grammar is probably the closest speech recognition engine to my needs - it allows people to create new language models for it. However my attempts to create a Chinese language model were unsuccessful: maybe i should try again |
I found another potential option: It seems to meet the needs of DSN and supports Chinese. |
By looking at Rhasspy's grammar documentation, I think we are fully capable of converting the DSN configuration file into the corresponding Rhasspy grammar file, so as to use it to complete the command word recognition. https://rhasspy.readthedocs.io/en/latest/training/#sentencesini I will try to develop soon. |
https://rhasspy.readthedocs.io/en/latest/#supported-languages I tried its command word detection ability: English, almost perfect, and the Kaldi speech engine recognized my foreigner's crappy colloquialism with complete accuracy. Chinese: It can hardly understand anything I say. The only supported Pocketsphinx speech engine is too old and the model used may not be advanced enough. It looks like I have to find a way to get Chinese to use the Kaldi speech engine, including finding or converting a Chinese language model that Rhasspy can use. |
Like the failure in daanzu/kaldi-active-grammar#21, Rhasspy's Kaldi models folder is also confusing me. I can't figure out what files to put where... maybe I need to join a training course on Kaldi... |
My kaldi learning has progressed, but I still haven't created a usable model. I decided to start the DSN refactoring work first and add Chinese kaldi support later. I found that the intent recognition function of rhasspy is based on voice2json. I will use voice2json directly as the latter is simpler than the former. |
Since the project containing kaldi can only run in Linux, I decided to rewrite dsn_service in Python and run it in Docker Desktop along with voice2json. |
WSL in another potential environment
…On Mon, Jul 4, 2022, 14:56 老虎会游泳 ***@***.***> wrote:
Since the project containing kaldi can only run in Linux, I decided to
rewrite dsn_service in Python and run it in Docker Desktop
<https://www.docker.com/products/docker-desktop/> along with voice2json.
—
Reply to this email directly, view it on GitHub
<#7 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIZX5FTMQHRQBTZZL2ZBNDLVSMXXHANCNFSM52LOQQJQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Instead of rewriting in Python, I continued to work in C#. DSN's configuration file parsing rules are complex, and rewriting takes some time. And I realized that running dsn_service in a container has limitations, not easy access to configuration files located in the Windows filesystem. So dsn_service will continue to run in Windows and it and voice2json will interact via MQTT protocol or command line interface. https://github.com/YihaoPeng/DragonbornSpeaksNaturally/commits/voice2json Note: far from done. |
Update 3:
|
Looks great! I'm trying it out now. |
To fix Docker's admin requirements: |
I have add this in update 4: https://github.com/YihaoPeng/DragonbornSpeaksNaturally/releases
The problem is that launching If you start Docker Desktop yourself in advance, you don't have this problem. Running |
I don't think adding myself to the The problem seems to be that the process context of |
Perhaps adding system to the docker group would help? |
The new speech recognition is much better then default windows one, it gets conversations almost every time, and takes a fraction of the time. However, my custom commands such as "Got anything for me" to press e and the default ones like clear hands don't seem to work. Equipping favorited items works as intended |
Can you share your config file? Make sure that custom commands come after the |
Config File
|
Two issues:
Fixed configuration file: ;;; DragonbornSpeaksNaturally (DSN)
;;; Provides speech recognition for the 64-bit PC version (Skyrim Special Edition) and VR version (SkyrimVR) of "The Elder Scrolls V: Skyrim".
;;; Tips:
;;;
;;; 1. Please rename or copy this file to "DragonbornSpeaksNaturally.ini".
;;; The file "DragonbornSpeaksNaturally.SAMPLE.ini" will not be read by DSN.
;;; So any changes on "DragonbornSpeaksNaturally.SAMPLE.ini" will not take effect.
;;;
;;; 2. Normally, "DragonbornSpeaksNaturally.ini" should be in the same folder as "DragonbornSpeaksNaturally.exe".
;;; However, if you install the MOD via a MOD Manager and not sure the location of "DragonbornSpeaksNaturally.exe",
;;; you can also put the "DragonbornSpeaksNaturally.ini" file in "<Documents (aka. My Documents)>\DragonbornSpeaksNaturally\"
;;; (a log file "DragonbornSpeaksNaturally.log" will be here after you running Skyrim with DSN).
;;;
;;; 3. A configuration item begins with a ";" indicates that it exists for example only,
;;; or you usually don't need to configure it. But you can also remove the ';' before it
;;; and change its value for your preference.
;;;
;;; 4. If the configuration file "DragonbornSpeaksNaturally.ini" contains non-ASCII characters,
;;; it must be saved as UTF-8 with BOM encoding. Otherwise, DSN cannot read non-ASCII characters normally.
;;; Please do not edit, delete or move the next line (the ini section title), otherwise the options below will not take effect
[SpeechRecognition]
;;; Uncomment (remove leading semicolon) to choose speech engine (default is Microsoft)
;;;
;;; Voice2Json recognizes the following languages better:
;;; cs-cz, de, en, en-us, es, fr, it, ko-kr, nl, ru, sv, vi
;;; Microsoft is recommended for other languages.
;;;
;;; To use the Voice2Json engine, you need to install Docker Desktop:
;;; Download from https://www.docker.com/products/docker-desktop/
;;;
;Engine=Microsoft
Engine=Voice2Json
;;; Set this to override your system's default locale
;;;
;;; Available Locales for Engine=Microsoft:
;;; en-us, en-uk, en-ca, en-in, en-au, fr, de, ja,
;;; zh-cn, zh-tw, zh-hk, zh-sg
;;;
;;; Available Locales for Engine=Voice2Json:
;;; ca-es, cs-cz, de, el-gr, en, en-in, en-us, es, es-mexican,
;;; fr, hi, it, ko-kr, kz, nl, pl, pt-br, ru, sv, vi, zh, zh-cn
;;; References: http://voice2json.org/#supported-languages
;;;
Locale=en-us
;;; When set to 1, the speech recognition service will log any audio signal issues like "too loud" or "too noisy"
bLogAudioSignalIssues=1
;;; Set these to override the minimum confidence required for matching dialogue and commands (default values are shown here)
dialogueMinConfidence=0.5
commandMinConfidence=0.7
;;; The speech engine will pause recognition after you say the following phrase.
;;; Both dialogue and console commands recognition are paused until you say one of the resume phrase.
;;; Remove the semicolon at the beginning to enable this feature.
pausePhrases=Please don't listen to me;Stop speech recognition
;;; Say the following phrase to resume recognition from the pause.
;;; Remove the semicolon at the beginning to enable this feature.
resumePhrases=Please listen to me again;Start speech recognition
;;; Audio played when pausing and resuming the speech recognition.
;;; Leave blank to disable audio playback.
;pauseAudioFile=C:\Windows\media\Speech Off.wav
;resumeAudioFile=C:\Windows\media\Speech On.wav
;;; A C # regular expression to remove content from the phrase that cannot be processed
;;; by the speech recognition engine.
;;;
;;; Note: It is recommended to replace these with a space to prevent accidental word merges.
;;; Double quotes can be used to enclose spaces, it is not part of the replacement result.
;;;
;;; Double quotes are not allowed in the speech recognition engine.
;;; About "(?<![a-zA-Z])'":
;;; Using single quotes with Chinese may cause exceptions.
;;; Like this: "吉'扎 的火焰风暴卷轴".
;;; So we need to remove the quote if it is not preceded by a letter.
;;;
;normalizeExpression=(?:"|\s+|(?<![a-zA-Z])')
;normalizeReplacement=" "
;;; A C# regular expression, matching content will be treated as optional,
;;; so you can omit them when you speak.
;;;
;;; Note: Try not to make the brackets appear in the replacement result,
;;; otherwise you need to read the bracket-self to match the optional part.
;;;
;optionalExpression=(?:\(([^)]*)\)|\[([^\]]*)\]|{([^}]*)}|<([^>]*)>|(([^)]*))|【([^】]*)】|〈([^〉]*)〉|﹝([^﹞]*)﹞|〔([^〕]*)〕)
;optionalReplacement=$1$2$3$4$5$6$7$8$9
;;; Please do not edit, delete or move the next line (the ini section title), otherwise the options below will not take effect
[Favorites]
;;; Set it to 1 to enable the favorites menu voice-equip and 0 to disable it.
;;;
;;; Tips: If you are struggling with uncontrolled random items equipping in your game,
;;; try set a more complicated equipPhrasePrefix or remove all items
;;; in knownEquipmentTypes or disable favorites menu voice-equip with this setting.
enabled=1
;;; Set this to your preferred prefix for equipping items.
;;; Separate multiple words with semicolons.
;;;
;;; Tips:
;;; If you are struggling with uncontrolled random items equipping in your game,
;;; try set a more complicated prefix.
equipPhrasePrefix=equip;wear;use
;;; Set to 1 to allow omission of equipPhrasePrefix,
;;; which allows you to directly name the equipment to equip the item.
;;;
;;; Note: Enabling this option can greatly increase the probability of false matches.
;;; DSN may equip items randomly due to noise.
omitHandSuffix=0
;;; This setting allows you to equip an item with it's type.
;;; Type is a part of the item name (such as "Dagger" for "Iron Dagger" and "Steel Dagger").
;;; If multiple items have the same type, the first will be equipped.
;;;
;;; Tips:
;;; 1. Remove all items below to disable this feature.
;;; 2. If you are struggling with uncontrolled random items equipping in your game, try to disable it.
;;; 3. Add a space before the word to prevent false matches (so "Iron Battleaxe" will not matches " Axe").
;;; If you don't like this, or if your language doesn't add spaces between words, remove them.
;;; 4. If an item is misclassified, you can fill in its full name at the beginning,
;;; so that it is in its own category and will not affect other items.
;;;
knownEquipmentTypes= Dagger; Mace; Sword; Axe; Battleaxe; Greatsword; Warhammer; Bow; Crossbow; Shield
;;; Set this for your language
;;; Tips: You can always omit hand suffix. You can say "equip xxx main" or just "equip xxx".
equipLeftSuffix=off;left
equipRightSuffix=main;right
equipBothSuffix=both
;;; Which hand is placed with one-handed items when the hand prefix is omitted.
;;; Valid values are "right", "left", "both" or the value of equipLeft/Right/BothSuffix your specified.
mainHand=right
;;; Or you can choose the main hand by type (Remove leading semicolon to enable)
;spellMainHand=left
;weaponMainHand=right
;;; If you enabled left-hand mode in SkyrimVR, set it to 1.
leftHandMode=0
;;; Allow you to say "left equip xxx", "right equip xxx", "both equip xxx"
useEquipHandPrefix=1
;;; Allow you to say "equip left xxx", "equip right xxx", "equip both xxx"
useEquipHandInfix=1
;;; Allow you to say "equip xxx left", "equip xxx right", "equip xxx both"
useEquipHandSuffix=1
;;; Please do not edit, delete or move the next line (the ini section title), otherwise the options below will not take effect
[Dialogue]
;;; Set it to 0 to disable the dialogue voice selection and 1 to enable.
enabled=1
;;; These phrases can be said to exit from dialogue
goodbyePhrases=I'll talk to you later;That's enough chit chat for now;See you later;Later, brother;Alright then;I'm out of here;That's all
;;; Set the subset matching mode in a dialogue. It allows you to match the whole sentence by saying only part of the sentence.
;;; There are 5 different modes and None (disabled) is the default.
;;; Try the following values only if there is a problem with the dialogue recognition:
;;; None
;;; SubsequenceContentRequired
;;; OrderedSubsetContentRequired
;;; Subsequence
;;; OrderedSubset
;;; Use None to disable the subset matching.
;;;
;;; See this MSDN page for more details:
;;; https://docs.microsoft.com/en-us/dotnet/api/system.speech.recognition.subsetmatchingmode?redirectedfrom=MSDN&view=netframework-4.8#remarks
;;;
;;; Note: 1. Subset matching always be disabled for ConsoleCommands.
;;; 2. Enabling this option can greatly increase the probability of false matches.
;;; DSN may randomly select conversations due to noise.
;;;
SubsetMatchingMode=None
;;; Please do not edit, delete or move the next line (the ini section title), otherwise the options below will not take effect
[ConsoleCommands]
;;;
;;; Add any custom console commands here, format is:
;;;
;;; phrase=console command 1;console command 2;console command 3
;;;
;;; Here is a reference manual on the commands available in Skyrim:
;;; https://en.uesp.net/wiki/Skyrim:Console
;;;
;;; Here are some examples:
;;; (Remove the ';' to enable these commands)
;;;
;Give me some gold=player.additem f 100
;Toggle god mode=tgm
;Die Die Die=killall
;;;
;;; You can also equip/cast specified item/spell/shout directly by command.
;;;
;;; You can find more items/spells/shouts' ID here:
;;; https://en.uesp.net/wiki/Skyrim:Items
;;; https://en.uesp.net/wiki/Skyrim:Spells
;;; https://en.uesp.net/wiki/Skyrim:Dragon_Shouts
;;;
;Ready to battle=player.equipitem 139af; player.equipshout 48ac9
;Shoot the dragon down=player.equipshout 44250; player.equipitem 139ad
I need treatment=player.equipspell 12fcc left; player.equipspell 12fcc right; player.cast 12fcc player left; player.cast 12fcc player right
I need quick treatment=player.equipspell 0002f3b8 left; player.equipspell 0002f3b8 right; player.cast 0002f3b8 player left; player.cast 0002f3b8 player right
;;;
;;; For a shout's command phrase, use a similar English spelling to improve the recognition rate.
;;;
;;; You can find more shouts' spell ID from the subpage of the page:
;;; https://en.uesp.net/wiki/Skyrim:Dragon_Shouts
;;;
;;; Note that the ID below the shout name is not its spell ID and
;;; can only be used with the equipitem command and not for the cast command.
;;; You must click the shout name to go to the subpage and copy the value under the "Spell ID" title.
;;; A shout corresponds to three spell IDs (with 1 to 3 power words).
;;;
Fus Loda=player.cast 00013f3a player voice
Force Loda=player.cast 00013f3a player voice
Fuse Loda=player.cast 00013f3a player voice
Your Tooso=player.cast 0003f9ed player voice
Lok Vakoor=player.cast 0003f50d player voice
Gol Hadov=player.cast 040179e1 player voice
;;;
;;; Here are some examples of using keypresses as commands
;;;
;;; The following commands can only be used in DSN and are executed by DSN itself.
;;; You can't use them in the Skyrim console. There are:
;;;
;;; press, tapkey, holdkey, releasekey, sleep and switchwindow.
;;;
;;; Here is a document for these commands:
;;; https://github.com/DougHamil/DragonbornSpeaksNaturally/wiki/Key-Commands-Guide
;;;
;;; Note: Using the "press e" command in SkyrimVR may cause the game to crash.
;;;
;;; In addition, inserting a xbox controller will cause the keyboard simulation to fail.
;;; The experience is: If you press the keyboard directly, there will be no response,
;;; and the key simulation will not respond.
;;; Here is a solution: https://www.nexusmods.com/skyrim/mods/30913
;;;
;;; Some examples:
;;;
Open Map=switchwindow; press m
Close Map=switchwindow; press m
What should I do=switchwindow;press j
I don't want any more=switchwindow;press esc
Hey man=switchwindow;press e
Greetings=switchwindow;press e
Got anything=switchwindow;press e
Hello, brother=switchwindow;press e
Need any help=switchwindow;press e
I want this=switchwindow;press e
;Casting a shout=press z 3000
;;;
;;; Some more complicated examples of combined keypress scripts
;;; Note: Using the "press e" command in SkyrimVR may cause the game to crash.
;;;
;Sneak and cancel=press ctrl; sleep 5000; press ctrl
;Fire in the hole=player.cast 0003f9ed player voice; sleep 3000; player.cast 00013f3a player voice
;Casting with two hands=holdkey leftmousebutton; sleep 1000; holdkey rightmousebutton; sleep 5000; releasekey leftmousebutton; sleep 3000; releasekey rightmousebutton
;Typing in console=switchwindow; sleep 50; tapkey ~; sleep 50; tapkey s a v e; tapkey blank 1; sleep 300; tapkey enter; sleep 3000; tapkey ~
;;;
;;; Keypress commands in SkyrimVR
;;; Note: Using the "press e" command in SkyrimVR may cause the game to crash.
;;;
;;; All SSE keyboard shortcuts are available in SkyrimVR, so you can use the "press" command
;;; to do everything in SkyrimVR just likes in SSE, including to cast a left or right hand skill.
;;; But you have to make sure the game is the active window when a "press" command running.
;;; If it is not always, you can add the "switchwindow" command before each command.
;;;
;Sneak and cancel in VR=switchwindow; press ctrl; sleep 5000; press ctrl
Flora Way=switchwindow; press leftmousebutton 5000
Maz Mala=switchwindow; press rightmousebutton 5000
;Use something=switchwindow; press e
Gold and Glory=switchwindow; press z 3000
;;;
;;; Load a SRGS (Speech Recognition Grammar Specification) XML file ("@" + file name or full path)
;;;
;;; The XML file can be placed in the same directory as `DragonbornSpeaksNaturally.ini`.
;;;
;;; SRGS can implement complex matching rules without a limit on the number of phrases.
;;; However, you may need basic XML knowledge to be proficient in using it.
;;;
;;; The following is the standard document for SRGS:
;;; https://www.w3.org/TR/speech-grammar/
;;;
;;; Microsoft's SRGS Grammar XML Reference:
;;; https://docs.microsoft.com/en-us/previous-versions/office/developer/speech-technologies/hh361653(v=office.14)?redirectedfrom=MSDN
;;;
;;; DSN support `tag-format="semantics/1.0"`, you need to set the variable `out` to the final command to execute.
;;; Check out the documentation below to learn how to add semantics to your SRGS grammar:
;;; https://www.w3.org/TR/semantic-interpretation/
;;;
;;; Not available when Engine=Voice2Json.
;;;
;Dragonborn Unlimited=@SRGS.SAMPLE.xml |
This option will fix the recognizer problem, which replaces punctuation with spaces. I'll make it the default in the next version.
|
Works perfectly! |
Unfortunately, Window's default speech to text tends to have a lot of trouble understanding me. Adding the ability to hook into other programs and apis would be useful, as there are other more accurate programs such as Mozilla deep speech or https://github.com/Uberi/speech_recognition. Having the ability to have accurate online or slightly less accurate offline speech to text would be a great improvement over the default Windows recognition.
The text was updated successfully, but these errors were encountered: