登录查看更多内容

Python Speech Recognition – Artificial Intelligence

Malini Shukla

Senior Data Scientist || Hiring || 6M+ impressions || Trainer || Top Data Scientist || Speaker || Top content creator on LinkedIn || Tech Evangelist

发布日期: 2019年1月15日

What is Python Speech Recognition?

From systems facilitating single speakers and limited vocabularies of around a dozen words, to systems that recognize from multiple speakers and possess huge vocabularies in various languages, we have come a long way. What we do here is- we convert speech from physical sound to electrical signals using a microphone. Then, we use an analogue-to-digital converter to convert this to digital data. Finally, we use multiple models to transcribe audio to text. In the Hidden Markov Model (HMM), we divide the speech signal into 10-millisecond fragments.

Do you know about Recursion in Python

a. Available APIs in Python Speech Recognition

With Python, we have several APIs available:

apiai
assemblyai
google-cloud-speech
pocketsphinx
SpeechRecognition
watson-developer-cloud
wit

Some Python packages like wit and apiai offer more than just basic speech recognition. Here, though, we will demonstrate SpeechRecognition, which is easier to use. This hard-codes a default API key for the Google Web Speech API.

b. Supported File Types in Python Speech Recognition

WAV- PCM/LPCM format
AIFF
AIFF-C
FLAC

c. Prerequisites for Python Speech Recognition

You can use pip to install this-

pip install SpeechRecognition

To test the installation, you can import this in the interpreter and check the version-

>>> import speech_recognition as sr
>>> sr.__version__

‘3.8.1’

We also download a sample audio from here-

https://www.voiptroubleshooter.com/open_speech/american.html

Reading an Audio File in Python

a. The Recognizer class

First, we make an instance of the Recognizer class.

>>> r=sr.Recognizer()

With Recognizer, we have a method for each API-

recognize_bing()- Microsoft Bing Speech
recognize_google()- Google Web Speech API
recognize_google_cloud()- Google Cloud Speech
recognize_houndify()- Houndify
recognize_ibm()- IBM Speech to Text
recognize_sphinx- CMU Sphinx
recognize_wit()- Wit.ai

Exempting recognize_sphinx(), you need an Internet connection for anything else you’re working with.

You must read the Python web framework

b. Capturing data with record()

We can have the context manager open the file and read its contents, then record it into an AudioData instance.

>>> demo=sr.AudioFile('demo.wav')
>>> with demo as source:
audio=r.record(source)

To confirm this, try:

>>> type(audio)

<class ‘speech_recognition.AudioData’>

c. Recognizing Speech in the Audio

Finally, you can call recognize_google() to perform the transcription.

>>> r.recognize_google(audio)

“The Purge can use within The Smurfs the sheet without playback Mount delivery date habitat of a Vow these days it’s okay microwave devices are installed in Windows to use of lemons next find the password on the site that the houses such hard core in a garbage for the study core exercises talking is hard disk”

Well, you can read audio of a different language using the language parameter-

r.recognize_google(audio,language='ro-RO') #for Romanian

Reading a Segment of Audio

When you only want to read a part of your audio file, you can use the arguments offset– telling it where to begin (in seconds), and duration– telling it how long to listen.

Let’s take a tour of Python Datetime Module

>>> with demo as source:
audio=r.record(source,offset=4,duration=3)
>>> r.recognize_google(audio)

‘clear the sheet without me back’

Note that this caused issues at the extremes. It heard ‘murfs’, which it translated to ‘clear’. It also heard ‘me back’ instead of ‘playback’ because of the noise in the audio.

If we set the offset to 3.3,

>>> with demo as source:
audio=r.record(source,offset=3.3,duration=3)
>>> r.recognize_google(audio)

‘clear the sheet with Ok’

But check what happens when we set the offset to 2.5-

>>> with demo as source:
audio=r.record(source,offset=2.5,duration=3)
>>> r.recognize_google(audio)

‘National thanks’

Python Speech Recognition – Dealing with Noise

Okay, let’s face it. There will always be noise, no matter how professional appliances you use to record your audio. So let’s better learn to deal with it. The method adjust_for_ambient_noise() reads the first second of a file stream to calibrate the recognizer to the audio’s noise level. This often consumes that part of the audio, and it doesn’t make it to the transcription.

Do you know about Python Property

>>> with demo as source:
r.adjust_for_ambient_noise(source)
audio=r.record(source,offset=2.5,duration=3)
>>> r.recognize_google(audio)

‘clear the sheet’

We can provide this an argument for how long it should listen for noise so it can calibrate the recognizer. Let’s see how it produces two entirely different outputs for a difference as low as 0.005-

>>> with demo as source:
r.adjust_for_ambient_noise(source,duration=0.51)
audio=r.record(source,offset=2.5,duration=3)
>>> r.recognize_google(audio)

‘National thanks’

>>> with demo as source:
r.adjust_for_ambient_noise(source,duration=0.515)
audio=r.record(source,offset=2.5,duration=3)
>>> r.recognize_google(audio)

‘clear the sheet’

As you can see, adjust_for_ambient_noise() is definitely not a miracle worker. To get around this, you can use an audio-editing software like Audacity to preprocess the audio.

Working With Microphones

To be able to work with your own voice with speech recognition, you need the PyAudio package. You can install it with pip-

pip install PyAudio

Or you can download and install the binaries with pip. Download link-

https://www.lfd.uci.edu/~gohlke/pythonlibs/#pyaudio

Then:

pip install [file_name_for_binary]

For example:

pip install PyAudio-0.2.11-cp37-cp37m-win32.whl

a. The Microphone class

Like Recognizer for audio files, we will need Microphone for real-time speech data. Since we installed new packages, let’s exit our interpreter and open another session.

>>> import speech_recognition as sr

2. >>> r=sr.Recognizer()

Now, let’s create an instance of Microphone.

>>> mic=sr.Microphone()

Microphone has a static method to list out all microphones available-

>>> sr.Microphone.list_microphone_names()

Get the best guide for Python Career

Read Complete Article>>

See Also-

Python AI – NLP Tutorial
Python AI – Heuristic Search
Python AI – Genetic Algorithms

Python Speech Recognition – Artificial Intelligence

Malini Shukla

Senior Data Scientist || Hiring || 6M+ impressions || Trainer || Top Data Scientist || Speaker || Top content creator on LinkedIn || Tech Evangelist

What is Python Speech Recognition?

a. Available APIs in Python Speech Recognition

b. Supported File Types in Python Speech Recognition

c. Prerequisites for Python Speech Recognition

Reading an Audio File in Python

a. The Recognizer class

b. Capturing data with record()

c. Recognizing Speech in the Audio

Reading a Segment of Audio

Python Speech Recognition – Dealing with Noise

Working With Microphones

a. The Microphone class

更多精彩文章

社区洞察

其他会员也浏览了

IV Implementing a Systemic Dimensional Cyberprofiling Model in Python

Building a Machine Learning Model from Scratch Using?Python

SIMPLE LINEAR REGRESSION IN PYTHON :

6 Reasons Why Python Can Ace AI and Machine Learning Applications?

Class 8 - STRING MANIPULATION & BASIC STRUCTURES IN PYTHON Notes from the AI Advance course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Python Interview Questions Set 6

Python: Empowering Innovation, Revolutionizing the World

The Ultimate Guide To Speech Recognition With Python

Introduction to Regular Expressions in Python by MarsDevs.

Python vs R: An Introduction to Statistical Learning

What is Python Speech Recognition?

a. Available APIs in Python Speech Recognition

b. Supported File Types in Python Speech Recognition

c. Prerequisites for Python Speech Recognition

Reading an Audio File in Python

a. The Recognizer class

b. Capturing data with record()

c. Recognizing Speech in the Audio

Reading a Segment of Audio

Python Speech Recognition – Dealing with Noise

Working With Microphones

a. The Microphone class

Top 9 Computer Vision Project Ideas for Beginners

2020年1月21日

12 Cool Data Science project ideas with source code - "Strengthen your Resume"

2019年11月13日

Python Coding Interview Questions for Experienced - Python FAQ's

2019年9月30日

How Data Science is the Backbone of Retail?

2019年7月16日

How to Get The Coolest & The Sexiest Job Of the Century- “Become a Data Scientist”

2019年7月9日

What’s the Best programming Language to Start a Career in Data Science?

2019年6月25日

11 Reason Why TensorFlow is So Popular

2019年6月15日

20 Deep Learning Terminologies You Must Know

2019年6月14日

TensorFlow Performance Optimization – Tips To Improve Performance

2019年6月12日

Top 9 Reasons Why QlikView is Best in BI

2019年6月11日

社区洞察

其他会员也浏览了

IV Implementing a Systemic Dimensional Cyberprofiling Model in Python

Building a Machine Learning Model from Scratch Using?Python

SIMPLE LINEAR REGRESSION IN PYTHON :

6 Reasons Why Python Can Ace AI and Machine Learning Applications?

Class 8 - STRING MANIPULATION & BASIC STRUCTURES IN PYTHON Notes from the AI Advance course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Python Interview Questions Set 6

Python: Empowering Innovation, Revolutionizing the World

The Ultimate Guide To Speech Recognition With Python

Introduction to Regular Expressions in Python by MarsDevs.

Python vs R: An Introduction to Statistical Learning