This is a speech to text and text to speech app. It is designed to have a minimal interface and to be as simple as possible to use. It has the following features:
- Speech to text dictation functionality using vosk.
- Text to speech functionality using Google text to speech.
- Optical character recognition using Tesseract.
- Mark down preview using Marked.
This app is only available on MacOS Linux and Windows. With x86-64 CPU and as much RAM and as much CPU performance as possible.
Currently there are no pre-Built packages, however this will change in the future.
In order to install this up you can clone this repository and make sure that nodeJS is installed.
You can do this using the following commands:
git clone https://github.com/MaxAFriedrich/speechDown
npm install
npm start
In order to then use this app you can simply run npm start
in the root of the repository. You may wish to create a shortcut for this.
To use Google text to speech you need to set it up using the Google cloud platform. Here are some instructions on how to do this, which you also find in the app itself.
To use this app's text to speech capability, you need to set up authentication so this app can use Google Text-to-Speech. You can find instructions on how to do this here. NOTE: Do not "Set your authentication environment variable". Once you have completed this, tell this app where you have stored the JSON file you downloaded is.
Here are some general point is on the best way to use this software.
- You can only read scan or edit text in the code panel.
- The preview panel can only preview text and has no editing capabilities whatsoever.
Once you have opened the app all of the controls are along the top of the screen.