登录查看更多内容

How to do Speech Recognition

Swayam Mittal

Researcher in AI and NLP

发布日期: 2018年5月10日

Speech recognition has intrude into our lives. It’s built into our phones, smart watches and smart machine. It’s even automating our homes. For just $50, you can get an Amazon Echo Dot — a magic box that allows you to order pizza, get a weather report or even buy trash bags — just by speaking out loud.

But speech recognition has been around for decades, so why is it just now hitting the mainstream? The reason is that deep learning finally made speech recognition accurate enough to be useful outside of carefully controlled environments.

The big problem is that speech varies in speed. One person might say “hello!” very quickly and another person might say “heeeelllllllllllllooooo!” very slowly, producing a much longer sound file with much more data. Both both sound files should be recognized as exactly the same text — “hello!” Automatically aligning audio files of various lengths to a fixed-length piece of text turns out to be pretty hard.

But thanks to the Nyquist theorem, we know that we can use math to perfectly reconstruct the original sound wave from the spaced-out samples — as long as we sample at least twice as fast as the highest frequency we want to record.

You can leverage the google’s Speech Recognition API to converts spoken text (microphone) into written text (Python strings), briefly Speech to Text. You can simply speak in a microphone and Google API will translate this into written text. The API has excellent results for English language.

Installation

Google Speech API v2 is limited to 50 queries per day. Make sure you have a good microphone. You will need to install a few packages: PyAudio, PortAudio and SpeechRecognition. PyAudio 0.2.9 is required and you may need to compile that manually.

Code

Code will record audio from your microphone, send it to the speech API and return a Python string.The audio is recorded using the speech recognition module, the module will include on top of the program. Secondly we send the record speech to the Google speech recognition API which will then return the output. r.recognize_google(audio) returns a string.

If you want to build a TEXT TO SPEECH instead than machine can also be used to create artificial speech called as speech synthesizer using open source software speech synthesizer called as eSpeak for English and other languages, for Linux and Windows

Installation

Code

Application

Finally connecting all block to build Personal Voice Assistant. For this experiment you will need (Ubuntu) Linux, Python and a working microphone

The features you want to build are mentioned below with simple answer command:

· Recognize spoken voice (Speech recognition)

· Answer in spoken voice (Text to speech)

Complete Code

Venkatesh Kedarisetti

IoT Engineer at Inference Labs

6 年

Good one.....

要查看或添加评论，请登录

Swayam Mittal的更多文章

LSTM GRU ATTENTION - Explained

2020年12月3日

LSTM GRU ATTENTION - Explained

This blog serves the purpose of understanding LSTM GRU and Attention What is LSTM It is a special type of RNN, capable…
Summarization of COVID research papers using Transformers

2020年4月22日

Summarization of COVID research papers using Transformers

Introduction Researchers have open sourced COVID-19 open research data set (CORD-19) with resource of over 52,000…
Clinical NLP and ML Research

2019年3月4日

Clinical NLP and ML Research

While still a very young area of research, clinical NLP has become one of the most popular areas of research due to the…

1 条评论
Embedded Intelligence on Raspberry Pi

2019年1月18日

Embedded Intelligence on Raspberry Pi

The Embedded Learning Library (ELL) is a set of tools for allowing Arduino, Raspberry Pis, to take advantage of machine…
[ Paper Summary ] - Google BERT

2018年11月9日

[ Paper Summary ] - Google BERT

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding - Jacob Devlin, Ming-Wei Chang, Kenton…
Speech and Natural Language Processing

2018年10月26日

Speech and Natural Language Processing

A curated list of speech and natural language processing resources.All Sub-caterogires are listed in alphabetical order.

2 条评论
Move 37

2018年9月10日

Move 37

Deep Reinforcement Learning This is the syllabus for "Move 37", Siraj Raval's free reinforcement learning course, as…
?????? List of Sentiment Analysis methods, implementations ??????

2018年5月21日

?????? List of Sentiment Analysis methods, implementations ??????

Sentiment Analysis is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and…
YOLO Object Detection

2018年5月7日

YOLO Object Detection

Object Detection is one of the classical problem in Computer Vision. It is seeing a huge progress from past few years…
“XGBoost, a top machine learning method on Kaggle, explained:”

2017年11月6日

“XGBoost, a top machine learning method on Kaggle, explained:”

“While working on a large structural dataset almost always ensembles of Decision Tree and Random Forest along with…

See all articles

How to do Speech Recognition

Swayam Mittal

Researcher in AI and NLP

Installation

Code

Installation

Code

Application

Complete Code

Swayam Mittal的更多文章

社区洞察

其他会员也浏览了

Analysis of Language Models' Ability to Generate Coherent and Contextualized Texts

Graph of Thoughts with LLMs; GPT Can Solve Math Problems; Bias and Fairness in LLMs; Ensembling Techniques – Weekly Concept; and More.

Survey of Multimodal LLMs; Meet GOAT-7B-Community Model; AWS’ Amazon Bedrock With More Capabilities; Using OpenAI & Langchain To Build App; and More

? When Accuracy Isn't Enough - Don't Make This Mistake

Issue #222 - THE ML ENGINEER ??

Improving Large Language Models Domain-Specific Answers with local long-term Memory. Testing "Cheshire Cat" with my book "Scrum for Hardware"

Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity

Top LLM Papers of the Week (March Week-3 2024)

Part Beta: Information Discovery and Discoverability

Installation

Code

Installation

Code

Application

Complete Code

Swayam Mittal的更多文章

LSTM GRU ATTENTION - Explained

Summarization of COVID research papers using Transformers

Clinical NLP and ML Research

Embedded Intelligence on Raspberry Pi

[ Paper Summary ] - Google BERT

Speech and Natural Language Processing

Move 37

?????? List of Sentiment Analysis methods, implementations ??????

YOLO Object Detection

“XGBoost, a top machine learning method on Kaggle, explained:”

社区洞察

其他会员也浏览了

Analysis of Language Models' Ability to Generate Coherent and Contextualized Texts

Graph of Thoughts with LLMs; GPT Can Solve Math Problems; Bias and Fairness in LLMs; Ensembling Techniques – Weekly Concept; and More.

Survey of Multimodal LLMs; Meet GOAT-7B-Community Model; AWS’ Amazon Bedrock With More Capabilities; Using OpenAI & Langchain To Build App; and More

? When Accuracy Isn't Enough - Don't Make This Mistake

Issue #222 - THE ML ENGINEER ??

Improving Large Language Models Domain-Specific Answers with local long-term Memory. Testing "Cheshire Cat" with my book "Scrum for Hardware"

Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity

Top LLM Papers of the Week (March Week-3 2024)

Part Beta: Information Discovery and Discoverability