When using tools like Amazon’s Alexa or Apple’s Siri, you might have wondered how they understand what you say and how you sound. Essentially, your spoken words are transformed into text and fed into a system that provides a response as a result.
Recently, I’ve been exploring speech recognition in native mobile apps. Based on my own experience, React Native Voice is the most accessible library for creating a React Native transcription app. However, if you’re unfamiliar with speech recognition or with React Native, it can be quite difficult to configure the app properly.
In this tutorial, I’ll walk you through creating a simple transcription app in React Native using the React Native Voice library. Our React Native transcription app will allow users to record audio and then transcribe it into text. Let’s get started!
React Native is a JavaScript framework that allows you to create native apps for both iOS and Android, saving you time and money by using the same code for both platforms.
React Native offers a number of benefits over other frameworks, including a smaller file size, faster performance, and better support for third-party libraries. In addition, React Native is open source, meaning there is a large community of developers who can contribute to the project and help improve it. Altogether, this makes React Native a great choice for building our transcription app.
React Native Voice includes a plethora of helpful event-triggered methods for handling speech in your app.
onSpeechStart
: Triggered when the app recognizes that someone has started speakingonSpeechRecognized
: Activated when the app determines that it can accurately transcribe the incoming speech dataonSpeechEnd
: Triggered when someone quits speaking and there is a moment of silenceonSpeechError
: Triggered when the speech recognition library throws an exceptiononSpeechResults
: Triggered when the speech recognition algorithm has finished transcribing and returnedonSpeechVolume
: Triggers when the app detects a change in the volume of the speakerTo get started, you should be familiar with React Native and its syntax. You’ll need to have a text editor like Sublime Text or Atom installed on your computer, and finally, you’ll need to install the React Native CLI tool.
Once you have these things installed, you can begin developing your transcription app. To create a new React Native project, you’ll first need to open your terminal and navigate to the directory where you want your project to live. Then, run the command react-native init
to create a new React Native project.
Once your project has been created, open it in your text editor. Our transcription app will require a few different components. For one, we’ll need a component that renders the transcription text. We’ll also need a component that allows the user to input audio, as well as a component that converts the audio to text. Once you have these components coded, you can put them all together to create your finished transcription app.
For speech-to-text conversion, we’ll use the Voice
component supplied by the React Native Voice library, which contains numerous events that you can use to start or stop voice recognition and to obtain the results of the voice recognition.
When we initialize the screen, we set certain event callbacks in the constructor, as shown in the code sample below. As you can see, we have functions for SpeechStart
and SpeechEnd
. Below are the callbacks that will be invoked automatically when the event occurs:
import { NativeModules, NativeEventEmitter, Platform } from 'react-native'; import invariant from 'invariant'; import { VoiceModule, SpeechEvents, SpeechRecognizedEvent, SpeechErrorEvent, SpeechResultsEvent, SpeechStartEvent, SpeechEndEvent, SpeechVolumeChangeEvent, } from './VoiceModuleTypes'; const Voice = NativeModules.Voice as VoiceModule; // NativeEventEmitter is only availabe on React Native platforms, so this conditional is used to avoid import conflicts in the browser/server const voiceEmitter = Platform.OS !== 'web' ? new NativeEventEmitter(Voice) : null; type SpeechEvent = keyof SpeechEvents; class RCTVoice { _loaded: boolean; _listeners: any[] | null; _events: Required<SpeechEvents>; constructor() { this._loaded = false; this._listeners = null; this._events = { onSpeechStart: () => {}, onSpeechRecognized: () => {}, onSpeechEnd: () => {}, onSpeechError: () => {}, onSpeechResults: () => {}, onSpeechPartialResults: () => {}, onSpeechVolumeChanged: () => {}, }; } removeAllListeners() { Voice.onSpeechStart = undefined; Voice.onSpeechRecognized = undefined; Voice.onSpeechEnd = undefined; Voice.onSpeechError = undefined; Voice.onSpeechResults = undefined; Voice.onSpeechPartialResults = undefined; Voice.onSpeechVolumeChanged = undefined; } destroy() { if (!this._loaded && !this._listeners) { return Promise.resolve(); } return new Promise((resolve, reject) => { Voice.destroySpeech((error: string) => { if (error) { reject(new Error(error)); } else { if (this._listeners) { this._listeners.map(listener => listener.remove()); this._listeners = null; } resolve(); } }); }); } start(locale: any, options = {}) { if (!this._loaded && !this._listeners && voiceEmitter !== null) { this._listeners = (Object.keys(this._events) as SpeechEvent[]).map( (key: SpeechEvent) => voiceEmitter.addListener(key, this._events[key]), ); } return new Promise((resolve, reject) => { const callback = (error: string) => { if (error) { reject(new Error(error)); } else { resolve(); } }; if (Platform.OS === 'android') { Voice.startSpeech( locale, Object.assign( { EXTRA_LANGUAGE_MODEL: 'LANGUAGE_MODEL_FREE_FORM', EXTRA_MAX_RESULTS: 5, EXTRA_PARTIAL_RESULTS: true, REQUEST_PERMISSIONS_AUTO: true, }, options, ), callback, ); } else { Voice.startSpeech(locale, callback); } }); } stop() { if (!this._loaded && !this._listeners) { return Promise.resolve(); } return new Promise((resolve, reject) => { Voice.stopSpeech(error => { if (error) { reject(new Error(error)); } else { resolve(); } }); }); } cancel() { if (!this._loaded && !this._listeners) { return Promise.resolve(); } return new Promise((resolve, reject) => { Voice.cancelSpeech(error => { if (error) { reject(new Error(error)); } else { resolve(); } }); }); } isAvailable(): Promise<0 | 1> { return new Promise((resolve, reject) => { Voice.isSpeechAvailable((isAvailable: 0 | 1, error: string) => { if (error) { reject(new Error(error)); } else { resolve(isAvailable); } }); }); } /** * (Android) Get a list of the speech recognition engines available on the device * */ getSpeechRecognitionServices() { if (Platform.OS !== 'android') { invariant( Voice, 'Speech recognition services can be queried for only on Android', ); return; } return Voice.getSpeechRecognitionServices(); } isRecognizing(): Promise<0 | 1> { return new Promise(resolve => { Voice.isRecognizing((isRecognizing: 0 | 1) => resolve(isRecognizing)); }); } set onSpeechStart(fn: (e: SpeechStartEvent) => void) { this._events.onSpeechStart = fn; } set onSpeechRecognized(fn: (e: SpeechRecognizedEvent) => void) { this._events.onSpeechRecognized = fn; } set onSpeechEnd(fn: (e: SpeechEndEvent) => void) { this._events.onSpeechEnd = fn; } set onSpeechError(fn: (e: SpeechErrorEvent) => void) { this._events.onSpeechError = fn; } set onSpeechResults(fn: (e: SpeechResultsEvent) => void) { this._events.onSpeechResults = fn; } set onSpeechPartialResults(fn: (e: SpeechResultsEvent) => void) { this._events.onSpeechPartialResults = fn; } set onSpeechVolumeChanged(fn: (e: SpeechVolumeChangeEvent) => void) { this._events.onSpeechVolumeChanged = fn; } } export { SpeechEndEvent, SpeechErrorEvent, SpeechEvents, SpeechStartEvent, SpeechRecognizedEvent, SpeechResultsEvent, SpeechVolumeChangeEvent, }; export default new RCTVoice();
We use the callback events above to determine the status of speech recognition. Now, let’s examine how to begin, halt, cancel, and delete the voice recognition process.
When you press the start button, the voice recognition method is launched. It is an asynchronous method that simply tries to start the voice recognition engine, logging an error to the console if it fails.
Now that we’re familiar with the React Native Voice library, let’s move on to the code! In this example, we’ll create a screen with a microphone symbol as the clickable button. After clicking on the button, we’ll begin voice recognition; with this process, we can retrieve the status of everything in the callback functions. To halt the speech-to-text translation, we can use the stop, cancel, and destroy buttons.
We’ll get two types of outcomes during and after speech recognition. When the speech recognizer completes its recognition, the result will appear. The speech recognizer will recognize some words before the final result, therefore, partial results will appear during the computation of results. Because it is an intermediary outcome, partial results can be numerous for a single recognition.
To create our React Native app, we’ll utilize react-native init
. Assuming you have Node.js installed on your machine, you can install the React Native CLI command line utility with npm.
Go to the workspace, launch the terminal, and execute the following command:
npm install -g react-native-cli
To launch a new React Native project, use the following command:
react-native init ProjectName
To start a new project with a specific React Native version, use the —version
parameter:
react-native init ProjectName --version X.XX.X react-native init ProjectName --version react-native@next
The command above will create a project structure in your project directory with an index file titled App.js
.
You have to install the react-native-voice
dependency before you can use the Voice
component. Open the terminal and navigate to your project to install the requirement:
cd ProjectName & npm install react-native-voice --save
For React Native Voice to work, you must have permission to use the microphone, and React Native for iOS requires adding keys to the Info.plist
file. Follow the instructions below to provide authorization to utilize the microphone and voice recognition in the iOS project:
In Xcode, open the project TranscriptionExample
-> ios
-> yourprj.xcworkspace
After launching the project in Xcode, click the project on the left sidebar to see various options in the right workspace. Choose the info tab, which is info.plist
.
Next, create two permissions keys,Privacy-Microphone Usage Description
and Privacy-Speech Recognition Usage Description
. You can also make the value visible when the permission dialog appears, as seen in the screenshot below:
Now, open App.js
in a functional component and replace the existing code with the full code below:
package com.wenkesj.voice; import android.Manifest; import java.util.ArrayList; import java.util.List; import java.util.Locale; import javax.annotation.Nullable; public class VoiceModule extends ReactContextBaseJavaModule implements RecognitionListener { final ReactApplicationContext reactContext; private SpeechRecognizer speech = null; private boolean isRecognizing = false; private String locale = null; public VoiceModule(ReactApplicationContext reactContext) { super(reactContext); this.reactContext = reactContext; } private String getLocale(String locale) { if (locale != null && !locale.equals("")) { return locale; } return Locale.getDefault().toString(); } private void startListening(ReadableMap opts) { if (speech != null) { speech.destroy(); speech = null; } if(opts.hasKey("RECOGNIZER_ENGINE")) { switch (opts.getString("RECOGNIZER_ENGINE")) { case "GOOGLE": { speech = SpeechRecognizer.createSpeechRecognizer(this.reactContext, ComponentName.unflattenFromString("com.google.android.googlequicksearchbox/com.google.android.voicesearch.serviceapi.GoogleRecognitionService")); break; } default: speech = SpeechRecognizer.createSpeechRecognizer(this.reactContext); } } else { speech = SpeechRecognizer.createSpeechRecognizer(this.reactContext); } speech.setRecognitionListener(this); final Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH); // Load the intent with options from JS ReadableMapKeySetIterator iterator = opts.keySetIterator(); while (iterator.hasNextKey()) { String key = iterator.nextKey(); switch (key) { case "EXTRA_LANGUAGE_MODEL": switch (opts.getString(key)) { case "LANGUAGE_MODEL_FREE_FORM": intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM); break; case "LANGUAGE_MODEL_WEB_SEARCH": intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_WEB_SEARCH); break; default: intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM); break; } break; case "EXTRA_MAX_RESULTS": { Double extras = opts.getDouble(key); intent.putExtra(RecognizerIntent.EXTRA_MAX_RESULTS, extras.intValue()); break; } case "EXTRA_PARTIAL_RESULTS": { intent.putExtra(RecognizerIntent.EXTRA_PARTIAL_RESULTS, opts.getBoolean(key)); break; } case "EXTRA_SPEECH_INPUT_MINIMUM_LENGTH_MILLIS": { Double extras = opts.getDouble(key); intent.putExtra(RecognizerIntent.EXTRA_SPEECH_INPUT_MINIMUM_LENGTH_MILLIS, extras.intValue()); break; } case "EXTRA_SPEECH_INPUT_COMPLETE_SILENCE_LENGTH_MILLIS": { Double extras = opts.getDouble(key); intent.putExtra(RecognizerIntent.EXTRA_SPEECH_INPUT_COMPLETE_SILENCE_LENGTH_MILLIS, extras.intValue()); break; } case "EXTRA_SPEECH_INPUT_POSSIBLY_COMPLETE_SILENCE_LENGTH_MILLIS": { Double extras = opts.getDouble(key); intent.putExtra(RecognizerIntent.EXTRA_SPEECH_INPUT_POSSIBLY_COMPLETE_SILENCE_LENGTH_MILLIS, extras.intValue()); break; } } } intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, getLocale(this.locale)); speech.startListening(intent); } private void startSpeechWithPermissions(final String locale, final ReadableMap opts, final Callback callback) { this.locale = locale; Handler mainHandler = new Handler(this.reactContext.getMainLooper()); mainHandler.post(new Runnable() { @Override public void run() { try { startListening(opts); isRecognizing = true; callback.invoke(false); } catch (Exception e) { callback.invoke(e.getMessage()); } } }); } @Override public String getName() { return "RCTVoice"; } @ReactMethod public void startSpeech(final String locale, final ReadableMap opts, final Callback callback) { if (!isPermissionGranted() && opts.getBoolean("REQUEST_PERMISSIONS_AUTO")) { String[] PERMISSIONS = {Manifest.permission.RECORD_AUDIO}; if (this.getCurrentActivity() != null) { ((PermissionAwareActivity) this.getCurrentActivity()).requestPermissions(PERMISSIONS, 1, new PermissionListener() { public boolean onRequestPermissionsResult(final int requestCode, @NonNull final String[] permissions, @NonNull final int[] grantResults) { boolean permissionsGranted = true; for (int i = 0; i < permissions.length; i++) { final boolean granted = grantResults[i] == PackageManager.PERMISSION_GRANTED; permissionsGranted = permissionsGranted && granted; } startSpeechWithPermissions(locale, opts, callback); return permissionsGranted; } }); } return; } startSpeechWithPermissions(locale, opts, callback); } @ReactMethod public void stopSpeech(final Callback callback) { Handler mainHandler = new Handler(this.reactContext.getMainLooper()); mainHandler.post(new Runnable() { @Override public void run() { try { if (speech != null) { speech.stopListening(); } isRecognizing = false; callback.invoke(false); } catch(Exception e) { callback.invoke(e.getMessage()); } } }); } @ReactMethod public void cancelSpeech(final Callback callback) { Handler mainHandler = new Handler(this.reactContext.getMainLooper()); mainHandler.post(new Runnable() { @Override public void run() { try { if (speech != null) { speech.cancel(); } isRecognizing = false; callback.invoke(false); } catch(Exception e) { callback.invoke(e.getMessage()); } } }); } @ReactMethod public void destroySpeech(final Callback callback) { Handler mainHandler = new Handler(this.reactContext.getMainLooper()); mainHandler.post(new Runnable() { @Override public void run() { try { if (speech != null) { speech.destroy(); } speech = null; isRecognizing = false; callback.invoke(false); } catch(Exception e) { callback.invoke(e.getMessage()); } } }); } @ReactMethod public void isSpeechAvailable(final Callback callback) { final VoiceModule self = this; Handler mainHandler = new Handler(this.reactContext.getMainLooper()); mainHandler.post(new Runnable() { @Override public void run() { try { Boolean isSpeechAvailable = SpeechRecognizer.isRecognitionAvailable(self.reactContext); callback.invoke(isSpeechAvailable, false); } catch(Exception e) { callback.invoke(false, e.getMessage()); } } }); } @ReactMethod public void getSpeechRecognitionServices(Promise promise) { final List<ResolveInfo> services = this.reactContext.getPackageManager() .queryIntentServices(new Intent(RecognitionService.SERVICE_INTERFACE), 0); WritableArray serviceNames = Arguments.createArray(); for (ResolveInfo service : services) { serviceNames.pushString(service.serviceInfo.packageName); } promise.resolve(serviceNames); } private boolean isPermissionGranted() { String permission = Manifest.permission.RECORD_AUDIO; int res = getReactApplicationContext().checkCallingOrSelfPermission(permission); return res == PackageManager.PERMISSION_GRANTED; } @ReactMethod public void isRecognizing(Callback callback) { callback.invoke(isRecognizing); } private void sendEvent(String eventName, @Nullable WritableMap params) { this.reactContext .getJSModule(DeviceEventManagerModule.RCTDeviceEventEmitter.class) .emit(eventName, params); } @Override public void onBeginningOfSpeech() { WritableMap event = Arguments.createMap(); event.putBoolean("error", false); sendEvent("onSpeechStart", event); Log.d("ASR", "onBeginningOfSpeech()"); } @Override public void onBufferReceived(byte[] buffer) { WritableMap event = Arguments.createMap(); event.putBoolean("error", false); sendEvent("onSpeechRecognized", event); Log.d("ASR", "onBufferReceived()"); } @Override public void onEndOfSpeech() { WritableMap event = Arguments.createMap(); event.putBoolean("error", false); sendEvent("onSpeechEnd", event); Log.d("ASR", "onEndOfSpeech()"); isRecognizing = false; } @Override public void onError(int errorCode) { String errorMessage = String.format("%d/%s", errorCode, getErrorText(errorCode)); WritableMap error = Arguments.createMap(); error.putString("message", errorMessage); error.putString("code", String.valueOf(errorCode)); WritableMap event = Arguments.createMap(); event.putMap("error", error); sendEvent("onSpeechError", event); Log.d("ASR", "onError() - " + errorMessage); } @Override public void onEvent(int arg0, Bundle arg1) { } @Override public void onPartialResults(Bundle results) { WritableArray arr = Arguments.createArray(); ArrayList<String> matches = results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION); for (String result : matches) { arr.pushString(result); } WritableMap event = Arguments.createMap(); event.putArray("value", arr); sendEvent("onSpeechPartialResults", event); Log.d("ASR", "onPartialResults()"); } @Override public void onReadyForSpeech(Bundle arg0) { WritableMap event = Arguments.createMap(); event.putBoolean("error", false); sendEvent("onSpeechStart", event); Log.d("ASR", "onReadyForSpeech()"); } @Override public void onResults(Bundle results) { WritableArray arr = Arguments.createArray(); ArrayList<String> matches = results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION); for (String result : matches) { arr.pushString(result); } WritableMap event = Arguments.createMap(); event.putArray("value", arr); sendEvent("onSpeechResults", event); Log.d("ASR", "onResults()"); } @Override public void onRmsChanged(float rmsdB) { WritableMap event = Arguments.createMap(); event.putDouble("value", (double) rmsdB); sendEvent("onSpeechVolumeChanged", event); } public static String getErrorText(int errorCode) { String message; switch (errorCode) { case SpeechRecognizer.ERROR_AUDIO: message = "Audio recording error"; break; case SpeechRecognizer.ERROR_CLIENT: message = "Client side error"; break; case SpeechRecognizer.ERROR_INSUFFICIENT_PERMISSIONS: message = "Insufficient permissions"; break; case SpeechRecognizer.ERROR_NETWORK: message = "Network error"; break; case SpeechRecognizer.ERROR_NETWORK_TIMEOUT: message = "Network timeout"; break; case SpeechRecognizer.ERROR_NO_MATCH: message = "No match"; break; case SpeechRecognizer.ERROR_RECOGNIZER_BUSY: message = "RecognitionService busy"; break; case SpeechRecognizer.ERROR_SERVER: message = "error from server"; break; case SpeechRecognizer.ERROR_SPEECH_TIMEOUT: message = "No speech input"; break; default: message = "Didn't understand, please try again."; break; } return message; } }
Reopen the terminal and use the command below to go to your project:
cd ProjectName
To execute the project on an Android virtual device or a real debugging device, use the following command:
react-native run-android
For the iOS Simulator on macOS only, use the command below:
run-ios with react-native
You can find the complete code for this project at this GitHub repository. I extracted and modified the sections that were used for transcription and text-to-speech capabilities.
It’s incredible to see how far voice recognition has progressed and how simple it is to integrate it into our applications with little to no theoretical transcribing skills. I would strongly recommend adopting this functionality if you want to use voice recognition in your application but lack the skills or time to design a unique model.
You can also build on the information provided in this tutorial to add additional features to your transcription app. For example, you could allow users to search through their transcribed texts for specific keywords or terms. You could include a sharing feature so that users can share their transcriptions with others, or finally, you can provide a way for users to export their transcriptions into other formats, like PDFs or Word documents.
I hope this article was helpful. Please be sure to leave a comment if you have any questions or issues. Happy coding!
LogRocket is a React Native monitoring solution that helps you reproduce issues instantly, prioritize bugs, and understand performance in your React Native apps.
LogRocket also helps you increase conversion rates and product usage by showing you exactly how users are interacting with your app. LogRocket's product analytics features surface the reasons why users don't complete a particular flow or don't adopt a new feature.
Start proactively monitoring your React Native apps — try LogRocket for free.
Would you be interested in joining LogRocket's developer community?
Join LogRocket’s Content Advisory Board. You’ll help inform the type of content we create and get access to exclusive meetups, social accreditation, and swag.
Sign up nowLearn how to implement one-way and two-way data binding in Vue.js, using v-model and advanced techniques like defineModel for better apps.
Compare Prisma and Drizzle ORMs to learn their differences, strengths, and weaknesses for data access and migrations.
It’s easy for devs to default to JavaScript to fix every problem. Let’s use the RoLP to find simpler alternatives with HTML and CSS.
Learn how to manage memory leaks in Rust, avoid unsafe behavior, and use tools like weak references to ensure efficient programs.