Skip to content

Releases: JamesBrill/react-speech-recognition

removePolyfill function

08 Oct 11:20
6b28cf1
Compare
Choose a tag to compare

removePolyfill

If a polyfill was applied using applyPolyfill, reset the Speech Recognition engine to the native implementation. This can be useful when the user switches to a language that is supported by the native engine but not the polyfill engine.

SpeechRecognition.removePolyfill()

Inspired by #145

Add homepage property to package.json

14 May 09:38
11eef73
Compare
Choose a tag to compare

Microphone availability detection

22 Sep 20:54
c36833c
Compare
Choose a tag to compare

When react-speech-recognition first starts to listen, the browser will usually ask the user whether they give permission for the microphone to be used. If they deny access, react-speech-recognition did not previously handle this well:

  • For native browser implementations of the Web Speech API, it would be unaware that permission hadn't been given and indicate to consumers that the microphone was listening
  • For polyfills that throw errors when attempting to start listening in such a case, the error was not being caught, resulting in consumer apps crashing

This release introduces a new state: isMicrophoneAvailable. If the user denies access to the microphone, the value of this will change to false. This applies in both of the following cases:

  • On-spec case: where the recognition object passes an error object with value { error: 'not-allowed' } to its onerror callback
  • Off-spec case: for polyfills that don't implement onerror but instead just throw an error from their start method

After this release, consumers can use this library with greater confidence that their apps will continue to function even when the user denies microphone access, and have the ability to render fallback content in such a case.

Fix coverage badge

11 Jun 14:15
39712c9
Compare
Choose a tag to compare

Coverage badge now links to the report branch for the main branch.

Polyfill browser support detection

11 Jun 14:07
c0228b7
Compare
Choose a tag to compare

Response to issue #100

browserSupportsSpeechRecognition will no longer always return true when polyfills are used, and will now return false on browsers that do not support the APIs required for Speech Recognition polyfills (namely Internet Explorer and a handful of old browsers).

Another new behaviour is that if an implementation of Speech Recognition is already listening to the microphone at the moment when a polyfill is applied, it will be disconnected from react-speech-recognition and turned off first. This is to avoid multiple recognisers running at the same time. In practice, this shouldn't happen if consumers ensure the polyfill is applied before any "start listening" buttons are rendered.

Flag for continuous listening support

15 Apr 18:15
98b14bf
Compare
Choose a tag to compare

Allow users of continuous listening to disable the feature on browsers that don't support it by listening for a new browserSupportsContinuousListening state emitted from useSpeechRecognition. See the updated docs here.

Also added a Troubleshooting section to the README regarding Regenerator Runtime.

Polyfill support

21 Feb 13:24
d771228
Compare
Choose a tag to compare

The first support for polyfill integration! Highly experimental stuff. Allows for the Speech Recognition engine to be switched out for a polyfill. Tested with an Azure polyfill with the integration documented below.

Polyfills

If you want react-speech-recognition to work on more browsers than just Chrome, you can integrate a polyfill. This is a piece of code that fills in some missing feature in browsers that don't support it.

Under the hood, Web Speech API in Chrome uses Google's speech recognition servers. To replicate this functionality elsewhere, you will need to host your own speech recognition service and implement the Web Speech API using that service. That implementation, which is essentially a polyfill, can then be plugged into react-speech-recognition. You can write that polyfill yourself, but it's recommended you use one someone else has already made.

Basic usage

The SpeechRecognition class exported by react-speech-recognition has the method applyPolyfill. This can take an implementation of the W3C SpeechRecognition specification. From then on, that implementation will used by react-speech-recognition to transcribe speech picked up by the microphone.

SpeechRecognition.applyPolyfill(SpeechRecognitionPolyfill)

Note that this type of polyfill that does not pollute the global scope is known as a "ponyfill" - the distinction is explained here. react-speech-recognition will also pick up traditional polyfills - just make sure you import them before react-speech-recognition.

Usage recommendations

  • Call this as early as possible to minimise periods where fallback content, which you should render while the polyfill is loading, is rendered
  • Use your own loadingSpeechRecognition state rather than browserSupportsSpeechRecognition to decide when to render fallback content when Speech Recognition is not available. This is because on Chrome, browserSupportsSpeechRecognition will return true - as a result, your speech recognition component will appear briefly with the Google Speech Recognition engine and then with the polyfill engine, potentially causing a janky user experience. Some example code using the loading state approach can be found below
  • After applyPolyfill has been called, browserSupportsSpeechRecognition will always be true. The polyfill itself may not work on all browsers - it's worth having a further fallback to cover that case. Polyfills will usually require WebRTC support in the browser, so it's worth checking that window.navigator.mediaDevices.getUserMedia is present
  • Do not rely on polyfills being perfect implementations of the Speech Recognition specification - make sure you have tested them in different browsers and are aware of their individual limitations

Polyfill libraries

Rather than roll your own, you should use a ready-made polyfill for one of the major cloud providers' speech recognition services.

Azure Cognitive Services

This is Microsoft's offering for speech recognition (among many other features). The free trial gives you $200 of credit to get started. It's pretty easy to set up - see the documentation.

  • Polyfill repo: web-speech-cognitive-services
  • Polyfill author: compulim
  • Requirements:
    • Install web-speech-cognitive-services and microsoft-cognitiveservices-speech-sdk in your web app for this polyfill to function
    • You will need two things to configure this polyfill: the name of the Azure region your Speech Service is deployed in, plus a subscription key (or better still, an authorization token). This doc explains how to find those

Here is a basic example combining web-speech-cognitive-services and react-speech-recognition to get you started. This code worked with version 7.1.0 of the polyfill in February 2021 - if it has become outdated due to changes in the polyfill or in Azure Cognitive Services, please raise a GitHub issue or PR to get this updated.

import React, { useEffect, useState } from 'react';
import createSpeechServicesPonyfill from 'web-speech-cognitive-services';
import SpeechRecognition, { useSpeechRecognition } from 'react-speech-recognition';

const SUBSCRIPTION_KEY = '<INSERT_SUBSCRIPTION_KEY_HERE>';
const REGION = '<INSERT_REGION_HERE>';
const TOKEN_ENDPOINT = `https://${REGION}.api.cognitive.microsoft.com/sts/v1.0/issuetoken`;

const Dictaphone = () => {
  const [loadingSpeechRecognition, setLoadingSpeechRecognition] = useState(true);
  const { transcript, resetTranscript } = useSpeechRecognition();

  const startListening = () => SpeechRecognition.startListening({
    continuous: true,
    language: 'en-US'
  });

  useEffect(() => {
    const loadSpeechRecognition = async () => {
      const response = await fetch(TOKEN_ENDPOINT, {
        method: 'POST',
        headers: { 'Ocp-Apim-Subscription-Key': SUBSCRIPTION_KEY }
      });
      const authorizationToken = await response.text();
      const {
        SpeechRecognition: AzureSpeechRecognition
      } = await createSpeechServicesPonyfill({
        credentials: {
          region: REGION,
          authorizationToken,
        }
      });
      SpeechRecognition.applyPolyfill(AzureSpeechRecognition);
      setLoadingSpeechRecognition(false);
    }
    loadSpeechRecognition();
  }, []);

  if (loadingSpeechRecognition) {
    return null;
  }

  return (
    <div>
      <button onClick={startListening}>Start</button>
      <button onClick={SpeechRecognition.stopListening}>Stop</button>
      <button onClick={resetTranscript}>Reset</button>
      <p>{transcript}</p>
    </div>
  );
};
export default Dictaphone;

Caveats

  • On Safari and Firefox, an error will be thrown if calling startListening to switch to a different language without first calling stopListening. It's recommended that you stick to one language and, if you do need to change languages, call stopListening first
  • If you don't specify a language, Azure will return a 400 response. When calling startListening, you will need to explicitly provide one of the language codes defined here. For English, use en-GB or en-US
  • Safari will throw an error on localhost as it requires HTTPS. ngrok is a nice tool for serving a local web app over HTTPS (also good for testing your web app on mobile devices as well)
  • Currently untested on iOS (let me know if it works!)

AWS Transcribe

There is no polyfill for AWS Transcribe in the ecosystem yet, though a promising project can be found here.

Providing your own polyfill

If you want to roll your own implementation of the Speech Recognition API, follow the W3C SpeechRecognition specification. You should implement at least the following for react-speech-recognition to work:

  • continuous (property)
  • lang (property)
  • interimResults (property)
  • onresult (property). On the events received, the following properties are used:
    • event.resultIndex
    • event.results[i].isFinal
    • event.results[i][0].transcript
    • event.results[i][0].confidence
  • onend (property)
  • start (method)
  • stop (method)
  • abort (method)

bestMatchOnly option for fuzzy matching

16 Dec 11:53
a7c6d2a
Compare
Choose a tag to compare

When an array of command phrases is provided for a fuzzy command, there is the possibility that the callback will be triggered multiple times. For example, take the following command:

      {
        command: ['eat', 'sleep', 'leave'],
        callback: (command) => console.log(command),
        isFuzzyMatch: true,
        fuzzyMatchingThreshold: 0.2
      }

If the user says "leap", the callback will be triggered three times as it matches all three command phrases.

This release introduces a new command option bestMatchOnly. This is false by default but when set will ensure the callback is only called once by the command phrase with the best match. If we modify the example above to

      {
        command: ['eat', 'sleep', 'leave'],
        callback: (command) => console.log(command),
        isFuzzyMatch: true,
        fuzzyMatchingThreshold: 0.2,
        bestMatchOnly: true
      }

then it will only be called once for the "leave" command, which has a slightly better match than the others.

Identify command that triggered callback

15 Dec 15:18
88a7f82
Compare
Choose a tag to compare

To enable consumers that provide an array of command phrases for the same command callback, the lib now passes in the matched command phrase back to the callback. This is passed in through the last argument as the command property. Example:

    {
      command: ['Hello', 'Hi'],
      callback: ({ command }) => setMessage(`Hi there! You said: "${command}"`),
      matchInterim: true
    }

Array-based commands

14 Dec 11:38
6f26122
Compare
Choose a tag to compare

Based on #70, this enables an array of "phrases" to be passed into a command, allowing one callback to be triggered by multiple commands. e.g.

    {
      command: ['Hello', 'Hi'],
      callback: () => setMessage('Hi there'),
      matchInterim: true
    }