登录查看更多内容

Why QSR-DT-AOT is ESTD?

HUEX Labs

Unlocking the power of AI to transform your Human Experience.

发布日期: 2022年1月19日

+ 关注

The title of this post is a mouthful of abbreviations.

Let’s take a moment to dissect the title of the post before delving deeper.

ESTD - Easier Said Than Done. Easy Peezy
QSR
DT
AOT

A Quick Service Restaurants aka QSRs are the most popular segment of the food industry where you can pick up a quick bite or your favorite beverage - this includes options to dine-in, on-the-go via drive-thru, or pick-up.

A Drive-Thru is not a new feature, it is a 100-year-old innovation that grew alongside the explosive growth of automobiles. It started as a convenience on the go but became a lifeline for existence in the recent pandemic.

Now comes the new age spin to a QSR-DT, the AOT.

AOT - stands for “Automated Order Taking”. If anything, the last couple of years have brought a phenomenal amount of change to digital services, contactless interactions, online delivery, order ahead, Buy Online, Pick-up in Store (BOPIS), and many more advancements in the Retail & Hospitality industry. This has been possible due to significant improvements in a few underlying technologies - Natural Language Processing, Automated Speech Recognition, Computer Vision, precision location-based services, and such.

Owing to other macro conditions - labor challenges, continuous rise in operating cost, challenges in retaining existing employees, and rising retraining costs, QSRs are facing a new challenge that didn’t exist a few years ago. Do they have the necessary IT expertise to build the foundational components of Automated Order Taking? This has led to a variety of providers both big and small trying to address this problem.

Let’s take a moment to evaluate the lay of the land as it relates to building software architecture that supports “Automated Order Taking” at the QSR-Drive-Thru.

Anyone who has a reasonable knowledge of the Conversational AI domain would think this is a pretty straightforward and a “no brainer” use case to go after.

WAIT A SEC.

Why is this not resolved yet? Let’s take a moment to think about how a typical QSR-Drive-Thru operates.

There are several dimensions to think about -

Environmental conditions or externalities

Wind
Rain
Ruffling leaves/trees
Birds
Traffic
Highways

Customer side

Car noise
Music
Pets / passenger / children
How far is the customer from the microphone? (SUV vs. Sedan vs. Truck or other vehicles)
Is the customer-facing the microphone or are they looking at their phone to review the menu and order?
Customer’s language/accent
How clearly they say their food ordering requests (consumer behavior changes during different times of the day)

Locations

Microphonic system - A variety of brands have their QSR-DT microphonic systems with different levels of audio quality and sophistication
Wifi - Network QOS - Not very reliable internet connectivity
Staffing - Short staffed so the personnel is multi-tasking and having internal conversations to hand-off while trying to take the order coming from the drive-thru

Let’s add another important constraint to all the above listed.

Any application that is to automate the order-taking process,

should take a variety of customers saying it very differently as input. (ACCENTS & NOISE)
Synthesize it and understand the associated words, quantities, and modifiers that came in, (9 whole-grain vs. 9 whole green and 9 is not a quantity)
Have a way to match and map to product SKU in the internal system, (original vs. raspberry-filled are mapped internally differently)
Generate appropriate response and ask any clarifying questions back to the customer (Do you want it warmed?)
Continuously check order status (Oops. We are out of Chai Tea Latte)
The Application should remember all the items that are being ordered and have a way to correlate them back for any modifications (Can you change my 1st coffee to Almond milk instead of Soy)
Multiple people in the car (Cross talk with their partner to check if they need additional sugar for their coffee?)

Achieve all this in under a few seconds and without having to make the customer wait for all this processing is “something that other businesses do not face every day”.

How do you get started?

Now let’s take a step backward to see, how to get started in this journey. Any Voice automation journey like other Machine Learning projects - starts with “DATA”.

Now each of these steps requires expertise, computing, time, and money to be invested into to make some progress. A good starting point could be synthetic data, however, unless there is real-life data, there is not an easy way to improve accuracy in a live environment when the model is put to test.

Data Labelling exercise needs a very in-depth understanding of the products, how typically customers refer to them in that part of the country. A very famous example would be - coke. vs. soda. vs pop. They are all the same product but referred to differently across regions.

Data Management considerations

Gathering data from the locations today will mean there should be a way to get customer consent and acquire their voice data - Privacy & Ethical challenges.
In today’s QSR world the customer to kitchen personnel interaction is human-to-human communication. As part of Data pre-processing, there should be a way to segment the data into kitchen personnel and customer interaction. That is not very straightforward and requires complex diarization algorithms
Variety and volume - to train Machine Learning Algorithm, the data needs to be robust and brings the diversity of the users (Accents across countries, native vs. bi-lingual speakers)
Go-To-Market depends on Data pre-processing - Labeling / Annotating - needs an enormous amount of Time & monetary investments
In a recent WSJ Article about Starbucks, the Coffee giant has about 170K variants of products that could be made in a typical location - “less”, “more”, “warmed” on any products, etc., Just think about training a model that can understand and associate the action that needs to be entered into the POS system for a customer.

Architecture considerations

Think about streaming all the audio over to the Cloud, getting it processed, interacting with inventory, come back for a follow-up question. What is network latency?
If the decision is to have a cloud vs. edge solution, Are there ways to achieve an equal amount of sophistication on the edge considerations?

Infrastructure considerations

Is the system able to plug into the existing systems?
Does it have pre-built connectors to KDS, POS, etc.,?
Audio quality / noise-canceling microphonics systems with channel separation

Business considerations

How much does it cost to build this?
What changes do we need to make in our IT setup?
How accurate is the model going to be?
What happens when a new product or a variant needs to be introduced?
Is it going to shave off a few seconds as compared to today’s operations?
What are the Ethical and Privacy related ramifications the Brand has to adhere to?
Data strategy changes - retention, governance, and compliance, etc.,

Customer considerations

Do they have a choice to speak to a human if they feel to do so?
Customer opt-in / out
Delete their food ordering voice requests for privacy concerns
What if they speak partial English as they are bi-lingual and would like to speak in their own language

Closing comments

QSR Drive-Thru is a very complex environment to implement Voice AI automation for food ordering. An Automated Order Taking system needs to have a very clear articulation of metrics to evaluate the system performance under various conditions, achieve on-par with human-level accuracy or better, accomplish this within sub-seconds performance. Just running some pilots in a location under all ideal conditions is not enough evidence to ensure the model performance will stay superior under other conditions. There are players big and small who are trying to address this problem, but it is still an evolving field and will require a lot of fundamental research before Automated Order Taking goes mainstream and shows positive returns and results.

About the Author

Kiran Kadekoppa is the CTO and Co-founder of HUEX Labs, a startup focusing on building Edge-based Software services for the QSRs, Hospitality, and Retail industry. He has worked in the Banking industry playing various roles - across Architecture, Software Delivery & Program Management for one of the Top 5 Banks in the country. He is a Voice AI enthusiast and volunteers at Open Voice Network, a Linux Foundation-based standards body for building an interoperable Voice ecosystem.

DISCLAIMER:?The views expressed in this post are that of the author, and don’t necessarily reflect the views of their organizations.

#voicetech?#vui?#voicetechnology?#voiceai?#voicefirst?#voicetalent?#voice?#ai?#conversationalai?#conversationaldesign?#ml?#conversationalintelligence?#SpeechRecognition?#conversationintelligence?#speechrecognition?#contextrecognition?#ai?#speechai #qsr

"Renewable Rob" Merrill Fletcher, ChE

3 年

What a great explanation of QSR-DT-AOT.. smile.. Quick Service Restaurant Drive Through Automated Order Taking... beautiful. You made it seem so hard.. You probably could have mentioned that Huex has solved the problems.

4 次回应

查看更多评论

要查看或添加评论，请登录