Speech Recognition – It’s Not (Exactly) What You Think…
Pierce Buckley
CEO & Co-founder @babelforce. Making sense of AI and automation in CX. Passionate about sustainability in tech. Always learning (mainly about myself, why is that the hardest?)
Surely speech recognition is a simple topic? Even self-explanatory?
At the simplest level, it is simple; speech recognition is software that can ‘understand’ and respond to human speech.
It’s a piece of tech that’s often used in the inbound call center setting to solve simple queries.
So we’re on the same page then? Well… there are still some details to look at.
The problem is terminology – and that’s what we’re going to clear first.
Let’s look at speech recognition terminology
Here are the key terms. Each is a *part* of speech recognition, but none of them are the full story.
Conversational AI / IVR
Speech recognition systems are often used to replace or augment IVR solutions.
Instead of (or as well as) ‘press 1 for service’ contact centers use systems that callers can talk to.
Natural Language Processing (NLP)
The role of NLP is to turn speech into data that software can work with. It ‘hears’ what humans say and record that input.
Natural Language Understanding (NLU)
This is a little more sophisticated. NLU takes the words you say and aims to understand what they mean. Based on that, it can usually figure out what the caller needs to happen.
(Interested? Read more here: ‘NLU and NLP – what’s the difference?’)
Speech-to-Text (STT)
This is a core element of speech recognition. STT takes audio and turns it into text that can be processed and stored.
Text-to-Speech (TTS)
If you need a system to respond dynamically with speech that isn’t pre-recorded, it needs TTS. That enables it to construct entirely new phrases.
(Read all about it here: ‘What’s TTS doing for contact centers?’)
Voice recognition
Yes, in addition to speech recognition there’s also voice recognition.
And yes, it’s a totally different thing; voice recognition is security tech that recognizes a specific voice to do something like unlock a phone.
Terminology is causing all the confusion
So there’s the issue – you might informally call any of these functions ‘speech recognition’.
But the term is too broad to be especially helpful.
After all, there are a lot of speech recognition functions including Siri, conversational IVR, smart TVs, Alexa… and all of them are different under the hood.
Is ‘speech recognition’ an out of date term?
Speech recognition is a slightly old-fashioned phrase.
Let’s dip into the history…
Speech recognition has been influencing contact center services for decades. Bell Laboratories created the Audrey system back in the 1950s; that one could recognize digits spoken aloud.
Soon after, IBM created Shoebox, a system that could understand 16 words – about as many as a 2-year-old child! (I wonder which 16 words they chose?)
In the 1980s there were big advances, and even greater progress from Google’s voice search and Apple’s Siri in the new millennium.
So what am I saying?
Speech recognition was a target for technologists years ago, and a target they hit pretty impressively.
But objectives for modern voice-based systems include:
- Comprehension
- Machine learning
- Dynamic response
- Data integration
- Predictive analytics
- Agent guidance
When you look at it that way, speech recognition is just one part of what makes modern systems impressive. We can flip that idea – a system that recognizes speech but can’t do any of the above is pretty useless.
What’s the appeal of voice-based tools?
These tools are far more complicated than button-prompt IVRs, or… really any system that doesn’t use speech.
So why put in the effort?
In a nutshell, it’s because these tools make life easier for customers.
Speech may not be a natural fit for software (far from it in fact) but it is entirely natural for humans.
The strong preference customers have for voice-based systems is obvious when you compare resources like IVR systems. Given the choice to navigate a maze of button prompts, or simply state their need, customers tend to choose the latter.
We’ve compiled some great uses cases in How Delta saves $5million a year with conversational IVR… here are the highlights:
- Call containment increased 5%
- Misrouted calls dropped by 15%
- Capture of caller intent reached 75%
- AHT decreased 10%
- Agent availability increased 25%
So it turns out that voice-based systems are in that rare goldilocks zone. They’re something that customers are very keen for and they save businesses money.
And that’s not something you see every day…
If you want to learn more about automation in the contact center, grab Your (free) Guide to Contact Center Automation!