Insight of the Week: Contact Center AI's iPhone Moment

Insight of the Week: Contact Center AI's iPhone Moment

By Kerry Robinson

Something big happened on Tuesday. Your contact center will never be the same.


In a closed-door, invite-only event, with no livestream, OpenAI announced the availability of their ‘real-time’ API for GPT4o.


This is the enterprise equivalent of ‘advanced voice mode’ which was finally pushed out to the majority of paying ChatGPT subscribers last week.


Advanced voice mode gets rid of the three separate steps we’re used to: Speech to Text, LLM inference, and then Text to Speech.


Instead, the AI model natively receives audio and text, and outputs audio, and text. This reduces the time between when a user finishes speaking, and the system responds, to around 300 milliseconds – the same as human conversation. And with advanced voice mode, we get incredible voices that sound nearly indistinguishable from real people.


But there’s more: Advanced voice mode can both detect and express emotion. It can tell the difference between the same words, said in a different way, and respond appropriately. It can laugh, whisper, and produce useful backchannel sounds, like aha, mm-hmm.


And finally, with yesterday’s announcement. This kind of power is available to you and your business to serve your customers.


Their ‘realtime’ API, currently accepts voice and text input, and outputs voice and text. Images and video are apparently in the pipeline.


It’s hard to underestimate the impact. Until now, our IVR, Conversational AI, and even Gen AI-powered voice agents couldn’t respond fast enough to fully leverage the cooperative nature of conversation. But now they can.


They couldn’t respond appropriately and empathetically. But now they can.


I think we might look back at this as a tipping point. A point when clunky old IVRs and only-slightly-less-clunky natural language routing solutions become totally unacceptable, and chatbots lose their charm.


I might shoot a text message or WhatsApp to my bank or a retailer, or read one they send me… but to get something sorted, why would I rely on typing on my phone, tablet, or keyboard when I can have a fast, efficient, engaging conversation about it? Seriously, done right, AI agents built with the real-time API are gonna solve your problem before you can listen to the long, confusing list of options in a competitors IVR???


In last week’s article, I warned we may see a bifurcation into two types of B2C businesses:

  1. Those that provide commodity services that are purchased by AI, on behalf of consumers, at the lowest cost
  2. Those who manage to build and maintain customer relationships that are so deep, meaningful, and?experiential?that consumers choose to engage with them directly.?

With the real-time API, we are starting to get access to the kind of tech that may allow you to attain, or retain your position, as a B2C brand that gets direct access to your customers.


But it’s not cheap. A rough calculation suggests the API will cost around 18 cents per minute! You can get offshore agents for less. But AI gets better, faster, and cheaper all the time. GPT4 class models have crashed in price at an annualized rate of 90%.


And if you’re only focused on cost savings you’re missing the point.


More to come next week once we’ve had some time to build with these new models.


Kerry Robinson is an Oxford physicist with a Master's in Artificial Intelligence. Kerry is a technologist, scientist, and lover of data with over 20 years of experience in conversational AI. He combines business, customer experience, and technical expertise to deliver IVR, voice, and chatbot strategy and keep Waterfield Tech buzzing.

Subscribe to Kerry's Weekly AI Insights

Rick Jones

CX Architect @ Waterfield Tech | Generative AI, Self-Service Solutions

1 个月

Amazing how far technology has come since the beginning of the year!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了