How to do Speech Recognition

How to do Speech Recognition

Speech recognition has intrude into our lives. It’s built into our phones, smart watches and smart machine. It’s even automating our homes. For just $50, you can get an Amazon Echo Dot — a magic box that allows you to order pizza, get a weather report or even buy trash bags — just by speaking out loud.

But speech recognition has been around for decades, so why is it just now hitting the mainstream? The reason is that deep learning finally made speech recognition accurate enough to be useful outside of carefully controlled environments.

The big problem is that speech varies in speed. One person might say “hello!” very quickly and another person might say “heeeelllllllllllllooooo!” very slowly, producing a much longer sound file with much more data. Both both sound files should be recognized as exactly the same text — “hello!” Automatically aligning audio files of various lengths to a fixed-length piece of text turns out to be pretty hard.

But thanks to the Nyquist theorem, we know that we can use math to perfectly reconstruct the original sound wave from the spaced-out samples — as long as we sample at least twice as fast as the highest frequency we want to record.

You can leverage the google’s Speech Recognition API to converts spoken text (microphone) into written text (Python strings), briefly Speech to Text. You can simply speak in a microphone and Google API will translate this into written text. The API has excellent results for English language.

Installation

Google Speech API v2 is limited to 50 queries per day. Make sure you have a good microphone. You will need to install a few packages: PyAudio, PortAudio and SpeechRecognition. PyAudio 0.2.9 is required and you may need to compile that manually.


Code

Code will record audio from your microphone, send it to the speech API and return a Python string.The audio is recorded using the speech recognition module, the module will include on top of the program. Secondly we send the record speech to the Google speech recognition API which will then return the output. r.recognize_google(audio) returns a string.


If you want to build a TEXT TO SPEECH instead than machine can also be used to create artificial speech called as speech synthesizer using open source software speech synthesizer called as eSpeak for English and other languages, for Linux and Windows

Installation


Code


Application

Finally connecting all block to build Personal Voice Assistant. For this experiment you will need (Ubuntu) Linux, Python and a working microphone

The features you want to build are mentioned below with simple answer command:

·        Recognize spoken voice (Speech recognition)

·        Answer in spoken voice (Text to speech)

Complete Code


Venkatesh Kedarisetti

IoT Engineer at Inference Labs

6 年

Good one.....

回复

要查看或添加评论,请登录

Swayam Mittal的更多文章

  • LSTM GRU ATTENTION - Explained

    LSTM GRU ATTENTION - Explained

    This blog serves the purpose of understanding LSTM GRU and Attention What is LSTM It is a special type of RNN, capable…

  • Summarization of COVID research papers using Transformers

    Summarization of COVID research papers using Transformers

    Introduction Researchers have open sourced COVID-19 open research data set (CORD-19) with resource of over 52,000…

  • Clinical NLP and ML Research

    Clinical NLP and ML Research

    While still a very young area of research, clinical NLP has become one of the most popular areas of research due to the…

    1 条评论
  • Embedded Intelligence on Raspberry Pi

    Embedded Intelligence on Raspberry Pi

    The Embedded Learning Library (ELL) is a set of tools for allowing Arduino, Raspberry Pis, to take advantage of machine…

  • [ Paper Summary ] - Google BERT

    [ Paper Summary ] - Google BERT

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding - Jacob Devlin, Ming-Wei Chang, Kenton…

  • Speech and Natural Language Processing

    Speech and Natural Language Processing

    A curated list of speech and natural language processing resources.All Sub-caterogires are listed in alphabetical order.

    2 条评论
  • Move 37

    Move 37

    Deep Reinforcement Learning This is the syllabus for "Move 37", Siraj Raval's free reinforcement learning course, as…

  • ?????? List of Sentiment Analysis methods, implementations ??????

    ?????? List of Sentiment Analysis methods, implementations ??????

    Sentiment Analysis is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and…

  • YOLO Object Detection

    YOLO Object Detection

    Object Detection is one of the classical problem in Computer Vision. It is seeing a huge progress from past few years…

  • “XGBoost, a top machine learning method on Kaggle, explained:”

    “XGBoost, a top machine learning method on Kaggle, explained:”

    “While working on a large structural dataset almost always ensembles of Decision Tree and Random Forest along with…

社区洞察

其他会员也浏览了