Speech-to-Text GRPC Microservice

Bi-directional streaming speech-to-text (STT or ASR) micro-service that proxies cloud ASRs: Google Cloud Speech, IBM Bluemix STT speech and Hound STT. To avoid cloud ASR from abruptly asserting end-of-speech, we implement custom voice-activity detection (VAD) with tunable activity threshold.

Install dependicies

Preferred way is to do in virtualenv (Python 2.7).

pip install --upgrade pip
pip install -r requirements.txt

Compile .proto

cd proto
bash generate_pb.sh

Proxy server

Credentials

Google ASR

Download the service account key for Google Cloud Speech and move it to asr/google_key.json. Run the following command in the terminal to set the Google ASR credential path

export GOOGLE_APPLICATION_CREDENTIALS=asr/google_key.json

Hound ASR

Add clientID and ClientKey for Hound STT under asr/hound_key.json

IBM ASR

Add credentials for IBM Bluemix STT under asr/ibm_key.json

Database

Uses local log directory.

Create log directory

Create log folder and log/log.json file with empty ({}) json contents.

Start server

Start the server on a given port. Running on ports below 1024 requires root privileges.

python stt_server.py -p 9080

Proxy Client

Configuration

Edit settings.json to specify the ASR settings.

Stream from recorded file

python test_stt_client.py -p 9080 -in audio/whatistheweatherthere.wav

Stream from microphone (MAC OS)

Requires sox.

rec -p -q | sox - -c 1 -r 16000 -t s16 -q -L - | python test_stt_client.py -p 9080 -in stdin

Long running speech

Set continuous: true and chunksize: 3072 (byte size of audio chunk) in settings.json for continuous long running speech (capped at ~1min). WebRTC VAD is utilized for silence detection.

Running client

python test_stt_client.py -p 9080 -in audio/whatistheweatherthere.wav

shows the following output

############### google ASR ################
what is the weather there***


############### hound ASR ################
what is the weather there***


############### ibm ASR ################
what is the weather there ***

Response format

Return type to the client is a stream of JSON responses like:

{
	"asr": "google",
	"transcript": "what is the weather there"
	"is_final": False
  }

Testing

Individual ASR blocks (XXX = goog, ibm, hound) can be tsted locally as follows. For Google make sure the credentials are exported. python -m asr.XXX -in audio/whatistheweatherthere.wav

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech-to-Text GRPC Microservice

Install dependicies

Compile .proto

Proxy server

Credentials

Google ASR

Hound ASR

IBM ASR

Database

Create log directory

Start server

Proxy Client

Configuration

Stream from recorded file

Stream from microphone (MAC OS)

Long running speech

Running client

Response format

Testing

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
asr		asr
audio		audio
proto		proto
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
settings.json		settings.json
stt_server.py		stt_server.py
test_stt_client.py		test_stt_client.py

gkchai/SpeechToText

Folders and files

Latest commit

History

Repository files navigation

Speech-to-Text GRPC Microservice

Install dependicies

Compile .proto

Proxy server

Credentials

Google ASR

Hound ASR

IBM ASR

Database

Create log directory

Start server

Proxy Client

Configuration

Stream from recorded file

Stream from microphone (MAC OS)

Long running speech

Running client

Response format

Testing

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages