Skip to content

How to use this API for passive listening

Aaron Gokaslan edited this page Apr 26, 2014 · 6 revisions

Passive Listening

Remember the computer from Star Trek? "Computer, write this wiki for me." Well, we might not possess sentient computers yet, but you can adapt this project to serve as a verbal interface.

Method 1:

Just constantly analyze data in intervals (every 10 seconds). In every returned Google Response, get all possible responses and search each of them for a keyword. This technique, while easy to implement, taxes your internet connection significantly and might get your IP banned from the API (We have no control over what Google decides to do with this.

##Method 2: Just use the MicrophoneAnalyzer class and getAudioVolume() method. When the volume exceeds the ambient noise level, start recording and send the data to Google for analysis. Once again, search for keywords in the response. This technique is slightly more resource conservative than the previous one.

##Method 3: (Recommended) What if you have a low signal to noise ratio? (In other words a lot of background noise?) While most programs employ spectral subtraction to amplify this effect, the developer of this project do not know enough about it to implement it at this time(any help, information or code on this topic is welcome). Instead, just analyze the frequency. Even the highest of squeakers do not have voices above 400hz typically so only send the data if the sound is above the volume threshold and below 400hz, (optionally above 50hz as well, but the frequency detection gets inaccurate around that level with current implementations). Here is some example code:

import java.io.; import javax.sound.sampled.AudioFileFormat; import com.darkprograms.speech.; import javaFlacEncoder.FLACFileWriter;

public class ambientListening{

public static void ambientListeningLoop(String[] args) {
	MicrophoneAnalyzer mic = new MicrophoneAnalyzer(FLACFileWriter.FLAC);
	mic.setAudioFile(new File("AudioTestNow.flac"));
	while(true){
		mic.open();
		final int THRESHOLD = 8;
		int volume = mic.getAudioVolume();
		boolean isSpeaking = (volume > THRESHOLD);
		if(isSpeaking){
			try {
				System.out.println("RECORDING...");
				mic.captureAudioToFile(mic.getAudioFile());//Saves audio to file.
				do{
					Thread.sleep(1000);//Updates every second
				}
				while(mic.getAudioVolume() > THRESHOLD);
				System.out.println("Recording Complete!");
				System.out.println("Recognizing...");
				Recognizer rec = new Recognizer(Recognizer.Languages.AUTO_DETECT);
				GoogleResponse response = rec.getRecognizedDataForFlac(mic.getAudioFile(), 3);
				displayResponse(response);//Displays output in Console
				System.out.println("Looping back");//Restarts loops
			} catch (Exception e) {
				// TODO Auto-generated catch block
				e.printStackTrace();
				System.out.println("Error Occured");
			}
			finally{
				mic.close();//Makes sure microphone closes on exit.
			}
		}
	}
}

private static void displayResponse(GoogleResponse gr){
	if(gr.getResponse() == null){
		System.out.println((String)null);
		return;
	}
	System.out.println("Google Response: " + gr.getResponse());
	System.out.println("Google is " + Double.parseDouble(gr.getConfidence())*100 + "% confident in"
			+ " the reply");
	System.out.println("Other Possible responses are: ");
	for(String s: gr.getOtherPossibleResponses()){
		System.out.println("\t" + s);
	}
}	

} You may also want to use a boolean array for increased accuracy by removing false positives, negatives, and too short segments of speech.

import java.util.Arrays;

import javax.speech.recognition.Recognizer;

import javaFlacEncoder.FLACFileWriter;


import com.darkprograms.speech.microphone.MicrophoneAnalyzer;

public class Test2 {
	private final static MicrophoneAnalyzer microphone = new MicrophoneAnalyzer(FLACFileWriter.FLAC);


	public static void main(String[] args){
		ambientListening();
	}

	public static void ambientListening(){
		String filename = "wav.test";
		try{
			microphone.captureAudioToFile(filename);
		}
		catch(Exception ex){
			ex.printStackTrace();
			return;
		}
		final int SILENT = microphone.getAudioVolume();
		boolean hasSpoken = false;
		boolean[] speaking = new boolean[10];
		Arrays.fill(speaking, false);
		for(int i = 0; i<100; i++){
			for(int x = speaking.length-1; x>1; x--){
				speaking[x] = speaking[x-1];
			}
			int frequency = microphone.getFrequency();
			int volume = microphone.getAudioVolume();
			speaking[0] = frequency<255 && volume>SILENT && frequency>85;
			System.out.println(speaking[0]);
			boolean totalValue = false;
			for(boolean bool: speaking){
				totalValue = totalValue || bool;
			}
			//if(speaking[0] && speaking[2] && speaking[3] && microphone.getAudioVolume()>10){
			if(totalValue && microphone.getAudioVolume()>20){	
				hasSpoken = true;
			}
			if(hasSpoken && !totalValue){
				try {
					Thread.sleep(100);
				} catch (InterruptedException e) {
					// TODO Auto-generated catch block
					e.printStackTrace();
				}
				break;
			}
		}
		if(hasSpoken){
	    Recognizer rec = new Recognizer(Recognizer.Languages.ENGLISH_US);
	    GoogleResponse out = rec.getRecognizedDataForWave(filename);
		}
		ambientListening();
	}
}

Method 4: getLevel from TargetDataLine

One can also get the level of the target data line to determine the current volume. This is great for adding a noise level visualization. Click here for more information.