top of page
Writer's pictureWhizzystack Solutions

Turning JavaScript from Speech to Text


Software


We are going to develop a simple voice-powered note app to demonstrate the API's functionality. It does three things:


Uses voice-to - text or traditional keyboard input to take notes.


Notes are saved to localStorage.


Shows all notes and gives them the possibility to listen through Speech Synthesis.




No fancy dependencies will be used, just good old jQuery for easier DOM operations and Shoelace for CSS styles. We will include them directly through CDN, no need to involve NPM for such a tiny project.


The HTML and CSS are very normal so we'll skip them and go straight to the JavaScript.

Talk to Text


In reality the Web Speech API is split into two fully separate interfaces. We have SpeechRecognition for understanding and transforming human voice into text (Speech – > Text) and SpeechSynthesis for reading loud strings in a computer-generated voice (Text-> Speech). We are going to continue from the former.


For a free browser feature the Speech Recognition API is surprisingly accurate. It acknowledged almost all of my speech correctly, and knew which words go together to create meaningful phrases. It also enables you to control specific characters such as full stops, question marks and new lines.


The first thing we need to do is check whether the user has access to the API and display an error message as appropriate. Unfortunately, the speech-to - text API is only supported in Chrome and Firefox (with a flag), so it is likely that many people will see that message.



The recognition variable will provide us with access to all the methods and properties of the API. There are different options available but we're only setting recognition.continuous to true. It will encourage users to speak in between words and phrases with longer pauses.

We also need to set up a couple of event handlers before we can use the voice recognition. Most simply listen for changes in the status of the recognition:





However, there is a special event of onresult which is very crucial. It is executed each time the user in quick succession speaks a word or several words, giving us access to a text transcription of what has been said.

When the onresult handler catches something we save it in a global variable and view it in a textarea:





The code above is a bit simplified. Android apps are getting a very unusual bug that causes it to repeat twice. There's no official solution yet but without any apparent side effects we managed to solve the problem. With that bug in mind the code appears as follows:





Once we have set up everything we can start using the voice recognition functionality of the browser. To start it simply call the process start():



$('#start-record-btn').on('click', function(e) {
 recognition.start();
});

That will prompt permission from users. If that is allowed the microphone of the device will be activated.


The browser will listen for a while, and will transcribe any recognized phrase or expression. After a few seconds of silence or when manually halted, the API will stop listening automatically.



$('#pause-record-btn').on('click', function(e) {
 recognition.stop();
});

Speech Language


Speech Synthesys is very simple indeed. The API can be accessed through the speechSynthesis object, and there are a few methods for playing, pausing, and other audio related material. It also has a few cool options which change the reader's pitch, rate, and even voice.


Everything we'll really need is the speak) (method for our example. It expects one argument, an example of the beautifully named class SpeechSynthesisUtterance.

Here's all the code needed to read a string out.





Concluding


An API like this gives you a quick shortcut to building bots that understand and speak the human language in an era where voice assistants are more popular then ever.


It can also be a great way to improve accessibility by adding voice control to your apps. Visually impaired users can benefit from user interfaces for both speech-to-text and text-to - speech.


The APIs for speech synthesis and speech recognition function relatively well and handle various languages and accents with ease. Sadly, they have for now limited browser support that narrows their production usage. Take a look at these third party APIs if you need a more reliable form of speech recognition:


3. CMUSphinx and it's JavaScript version Pocketsphinx (both open-source).

4. API.AI - Free Google API powered by Machine Learning


As a reputed Software Solutions Developer we have expertise in providing dedicated remote and outsourced technical resources for software services at very nominal cost. Besides experts in full stacks We also build web solutions, mobile apps and work on system integration, performance enhancement, cloud migrations and big data analytics. Don’t hesitate to get in touch with us!

This article is contributed by Ujjainee. She is currently pursuing a Bachelor of Technology in Computer Science . She likes most about Computer Engineering is that it allows her to learn and be creative. Also, She believes in sharing knowledge hence is very fond of writing technical contents for the same. Coding, analyzing and blogging are things which She can keep on doing!!

10 views0 comments

Recent Posts

See All

Comments


bottom of page