How to do Speech Recognition
Speech recognition has intrude into our lives. It’s built into our phones, smart watches and smart machine. It’s even automating our homes. For just $50, you can get an Amazon Echo Dot — a magic box that allows you to order pizza, get a weather report or even buy trash bags — just by speaking out loud.
But speech recognition has been around for decades, so why is it just now hitting the mainstream? The reason is that deep learning finally made speech recognition accurate enough to be useful outside of carefully controlled environments.
The big problem is that speech varies in speed. One person might say “hello!” very quickly and another person might say “heeeelllllllllllllooooo!” very slowly, producing a much longer sound file with much more data. Both both sound files should be recognized as exactly the same text — “hello!” Automatically aligning audio files of various lengths to a fixed-length piece of text turns out to be pretty hard.
But thanks to the Nyquist theorem, we know that we can use math to perfectly reconstruct the original sound wave from the spaced-out samples — as long as we sample at least twice as fast as the highest frequency we want to record.
You can leverage the google’s Speech Recognition API to converts spoken text (microphone) into written text (Python strings), briefly Speech to Text. You can simply speak in a microphone and Google API will translate this into written text. The API has excellent results for English language.
Installation
Google Speech API v2 is limited to 50 queries per day. Make sure you have a good microphone. You will need to install a few packages: PyAudio, PortAudio and SpeechRecognition. PyAudio 0.2.9 is required and you may need to compile that manually.
Code
Code will record audio from your microphone, send it to the speech API and return a Python string.The audio is recorded using the speech recognition module, the module will include on top of the program. Secondly we send the record speech to the Google speech recognition API which will then return the output. r.recognize_google(audio) returns a string.
If you want to build a TEXT TO SPEECH instead than machine can also be used to create artificial speech called as speech synthesizer using open source software speech synthesizer called as eSpeak for English and other languages, for Linux and Windows
Installation
Code
Application
Finally connecting all block to build Personal Voice Assistant. For this experiment you will need (Ubuntu) Linux, Python and a working microphone
The features you want to build are mentioned below with simple answer command:
· Recognize spoken voice (Speech recognition)
· Answer in spoken voice (Text to speech)
IoT Engineer at Inference Labs
6 年Good one.....