登录查看更多内容

Why we need a no-code multimodal orchestration tool

Kane Simms

??Triple Award-Winning AI Automation Consultancy ?? VUX. helps businesses leverage AI… Properly.

发布日期: 2023年1月26日

During Ben's chat with Jason F. Gilbert, Jason expressed frustration at how hard it is to orchestrate robotic movements with voice, sound effects, lights and other modes of expression.

This matters, and not just for Jason, who is the Lead Character Designer at Intuition Robotics, where he works on ElliQ – ‘the sidekick for healthier, happier aging.’

There are many more designers working in robotics. The same challenge will face conversation designers who work on digital humans as well – they also use various modalities to express themselves.

How can we design characters that use all their modes of expression to convey meaning? That’s what humans do. Words are only one component of how we communicate – we use much more to express ourselves.

No tools for the job

Here’s what Jason said (edited for brevity):

“There’s not a single platform for designing multimodality experiences on robots. I've talked a lot about this. I've asked a lot of people about this - people from different robotics companies. No one has this. Ideally, we would have some kind of Voiceflow, or some kind of no-code tool, where you can just go ‘okay, this is where the lights come in’. And now when [ElliQ] says this line, she also has this sound effect, this gesture and this thing, and that would be amazing. But you don't have that [tool] right now.”

That’s a huge challenge for designers like Jason. Conversation designers should be able to orchestrate every facet of a bot’s expressive repertoire. That’s how we communicate! We don’t use words with a little garnishing of something else. Sometimes words are the garnishing in what someone means.

In order to design a bot’s expressive language, we surely not only have to consider what’s natural for a human to understand, but also what new possibilities a bot has for expression that humans don’t have (such as lights or the buzzes, whirs and clicks a robot's motors make while they move).

Where are the solutions?

It’s also a huge opportunity for anyone who wants to design such a tool. It would need to allow designers to orchestrate everything a bot can do together, so that we can combine different modalities that let bots express themselves in various ways.

Our industry needs this. Bots and digital humans will be incredibly dull conversation partners if their facial expressions and body language express nothing. Without this tool the other modalities could easily just become window dressing if they’re not expressing what the bot means.

It could be worse if their various modalities contradict each other. Imagine if a user asks whether it’s time to take their daily medicine, and the bot nods it’s head (rather than shaking it) while saying “no”. Then they’ll confuse users, and it could be dangerous.

How do other industries do it?

It’s so easy to fall into the trap of thinking every challenge in conversational AI is brand new. That’s not always the case.

Animated films and videogames have workflows where every facet of a character is considered and brought to life, with varying results.

Check out this video on the making of Rango. Whereas actors will usually just be brought in to replace temporary voices for animated films, on Rango the actors were filmed together on a motion-capture stage acting out every scene. Those performances were the source materials for the animation, so each actor’s body movements, facial expressions and voice were captured before being applied to their CGI character. As you can see, it brings the characters to life and they’re so expressive!

领英推荐

Can humans and AI work side by side? A guide to what…

Kai-Fu Lee 3 年前

What Are The Top 6 Career-Defining Tech Trends In 2020…

Bernard Marr 5 年前

?? The World's First AI CEO Enters the Boardroom

Hanna Larsson 1 年前

Compare Rango to Fireman Sam. Both are CGI, but watching Fireman Sam is like watching wooden marionettes. Their body language is often redundant. Their facial expressions say very little. The character's potential for expression with their bodies and faces hasn't been exploited at all.

Why can’t we make bots and digital humans that are as expressive as a character from Rango?

What’s suitable for our industry?

Of course, most of us aren’t making entertainment products. Our bots often have important roles to play. They have to empathise. They have to build trust. They have to sell things. They have to represent a brand and its values.

You could say in those cases the stakes are higher than with entertainment. When someone watches a film they don’t like, they can ask for a refund or moan on social media, and then the story is over. On the other hand, if someone has an underwhelming conversation with an AI assistant, then they might never talk to it again or stop dealing with that brand.

For a companion bot such as ElliQ trust is paramount. The user and bot communicate with each other, and a relationship grows from those exchanges.

So, where’s the tool to help conversation designers orchestrate a bot’s multimodal expressions? We’re not animators and we don’t have mo-cap studios or actors. We shouldn’t have to learn every trick a CGI animator knows to do this, and yet it's our job to create excellent communicators.

Someone get on it! We need this.

Here’s my full interview with Jason – he gives many great insights.

You can also check out Kane’s interviews with Stefan Scherer and Danny Tomsett?for more on designing for robots and digital humans.

This article was written by Benjamin McCulloch. Ben is a freelance conversation designer and an expert in audio production. He has a decade of experience crafting natural sounding dialogue: recording, editing and directing voice talent in the studio. Some of his work includes dialogue editing for Philips’ ‘Breathless Choir’ series of commercials, a Cannes Pharma Grand-Prix winner; leading teams in localizing voices for Fortune 100 clients like Microsoft, as well as sound design and music composition for video games and film.

---

About Kane Simms

Kane Simms is the front door to the world of AI-powered customer experience, helping business leaders and teams understand why voice, conversational AI and NLP technologies are revolutionising customer experience and business transformation.

He's a Harvard Business Review-published thought-leader, a top?'voice AI influencer'?(Voicebot and SoundHound), who helps executives formulate the future of customer experience strategies, and guides teams in designing, building and implementing revolutionary products and services built on emerging AI and NLP technologies.

Subscribe to my newsletter
Listen to the?VUX World podcast?on?Apple,?Spotify?or wherever you get your podcasts
Take our free conversational AI maturity assessment

Conversational AI & NLP

19,411 位关注者

Richard Warzecha

Advancing Voice UX design one word at a time.

2 年

Before a tool, don’t we actually need some sort of semantic representation that combines linguistic and spatial communication? So, instead of simply NLU, we need Natural Communication Understanding. If I point to something instead of saying “the brown chair” or nod instead of saying “yes” the robot needs to understand that. Once we have this, then we can start talking about the kind of tool that can deliver it. And, if we do this correctly for a real world “spatial” context, wouldn’t it also be used for what others are planning to do in the Metaverse? Now, there’s got to be people smarter than us (yeah, hard to believe) working on this sort of thing. Where are they? Anyone else thinking we need something like this?

2 次回应

Andrew Francis

To write software that users will adore.

2 年

Kane Simms In order to get there I think you need to see some things such as: standards - tools have work together; (multi-disciplinary) teams that understand both the creative side and can write code. It is difficult to write a good tool, otherwise. *It’s so easy to fall into the trap of thinking every challenge in conversational AI is brand new. That’s not always the case* From what I notice is 1. folks in voice don't look at adjacent industries for insight. I remember talking to a veteran game script writer and they asked "Andrew, why don't you use a narrative engine?" 2. folks in voice, at least in the customer facing end, really don't look at what has been done in the past.

3 次回应

Greg Bulmash

Founder and Chief Creative Officer

2 年

The trick with conversational + multimodal is the same as the trick with the web. It took 5 years after NCSA Mosaic premiered for "web standards" to officially become a movement, and 10 years for Zeldman's seminal book. We're still about 18 months from the 10 year anniversary of the drop of the first Echo device, and more from the beginning of really popularizing multimodal voice experiences. Honestly, I've come to believe that Amazon's APL is like Microsoft's ActiveX. It got a good deal of adoption because of the wide use of the platform, but created a ton of apps that couldn't be run on anything other than devices using old versions of their software. This turned into a huge technical debt for Microsoft. They've spent TEN YEARS trying to migrate enterprise customers off of being IE dependent and to use web standards. Even with Edge now sharing a LOT of DNA with Chrome, Microsoft still struggles with the adoption of Edge, partially because of how IE haunts it. Not arguing for one-stop shopping, but a similar maturation of the field with open standards.

3 次回应

Martyn Redstone

genAssess - AI Skills Assessments | Conversational AI & Automation Specialist | Founder, Speaker, Educator & Problem Solver in Recruitment and Talent Tech

2 年

Couldn't agree more here. It's about time we think about the amalgamation of multi-modal experience design. Great article Kane Simms, Benjamin McCulloch & Jason F. Gilbert

3 次回应

Amy Stapleton

Conversational AI | Generative AI | Opus Research Senior Analyst

2 年

There’s Rapport and there’s Cocohub, as two examples of doing pretty much what you describe.

6 次回应

查看更多评论

要查看或添加评论，请登录

Kane Simms的更多文章

Shatter the hidden ceiling in CX automation

2025年2月25日

Shatter the hidden ceiling in CX automation

You’ve probably experienced this conversation with a live agent: “I’m sending a link to your phone now. Go ahead and…
2 strategies for using AI to increase digital adoption of web and mobile journeys

2025年2月18日

2 strategies for using AI to increase digital adoption of web and mobile journeys

Many businesses today still struggle with digital adoption. They invest heavily in online and mobile experiences, but…

6 条评论
Top AI trends shaping customer experience in 2025

2025年2月4日

Top AI trends shaping customer experience in 2025

According to our latest report, 15 AI Trends in CX 2025, businesses are at a crossroads in their AI journey. 2024 saw…

1 条评论
The importance of AI orchestration

2025年1月31日

The importance of AI orchestration

There are two growing uses for artificial intelligence in enterprise. One is process automation.

11 条评论
Conversational UIs will bring a flurry of “micro” productivity

2025年1月27日

Conversational UIs will bring a flurry of “micro” productivity

I recently had a conversation with Noam Fine, co-founder and CEO, Hear.ai and something dawned on me.

8 条评论
15 AI Trends in Customer Experience 2025

2025年1月24日

15 AI Trends in Customer Experience 2025

Yo! Short and sweet today. This should speak for itself.
Weekly AI news rundown!

2025年1月17日

Weekly AI news rundown!

Hello! Welcome to the first newsletter of 2025, packed with the latest on AI automation. We’re diving into the insights…
The truth about AI agents that the tech vendors don't want you to know

2025年1月14日

The truth about AI agents that the tech vendors don't want you to know

The term "agentic AI" has become a buzzword, fuelled by tech giants like Google, Microsoft, and OpenAI, alongside a…

3 条评论
Why AI is a change project, not a tech project

2025年1月2日

Why AI is a change project, not a tech project

AI is a change project, not a tech project. Why? Because success doesn’t rely on how good the technical solution is.

7 条评论
Bridging the Generative AI Gap: Governance, Quality, and Enterprise Adoption

2024年12月23日

Bridging the Generative AI Gap: Governance, Quality, and Enterprise Adoption

Every big tech company is working on foundational generative AI models and promises that the future will be different…

2 条评论

See all articles

Why we need a no-code multimodal orchestration tool

Kane Simms

??Triple Award-Winning AI Automation Consultancy ?? VUX. helps businesses leverage AI… Properly.

No tools for the job

Where are the solutions?

How do other industries do it?

领英推荐

What’s suitable for our industry?

About Kane Simms

Conversational AI & NLP

19,411 位关注者

Kane Simms的更多文章

社区洞察

其他会员也浏览了

Cerence AI 2024 in Review: A Year of Innovation and Momentum

The rise of humanoids and their impact on technology and industries

Rise of the machines: meet the robo-workforce

Tech Trends 2025: A New Era of Innovation and Transformation

The Creepy Problem Killing AI Projects

Chatbots Emerge As The First Killer AI App For Businesses

Premade Innovations Newsletter: Unveiling the Future of Humanoid Technology

Part III: The Convergence of AI and Robotics: A Global, Multidisciplinary Exploration

Transforming the world of work with AI: The top 5 application fields

The Rise of Robots: Entering the Decade of Robotics

No tools for the job

Where are the solutions?

How do other industries do it?

领英推荐

What’s suitable for our industry?

About Kane Simms

Conversational AI & NLP

19,411 位关注者

Kane Simms的更多文章

Shatter the hidden ceiling in CX automation

2 strategies for using AI to increase digital adoption of web and mobile journeys

Top AI trends shaping customer experience in 2025

The importance of AI orchestration

Conversational UIs will bring a flurry of “micro” productivity

15 AI Trends in Customer Experience 2025

Weekly AI news rundown!

The truth about AI agents that the tech vendors don't want you to know

Why AI is a change project, not a tech project

Bridging the Generative AI Gap: Governance, Quality, and Enterprise Adoption

社区洞察

其他会员也浏览了

Cerence AI 2024 in Review: A Year of Innovation and Momentum

The rise of humanoids and their impact on technology and industries

Rise of the machines: meet the robo-workforce

Tech Trends 2025: A New Era of Innovation and Transformation

The Creepy Problem Killing AI Projects

Chatbots Emerge As The First Killer AI App For Businesses

Premade Innovations Newsletter: Unveiling the Future of Humanoid Technology

Part III: The Convergence of AI and Robotics: A Global, Multidisciplinary Exploration

Transforming the world of work with AI: The top 5 application fields

The Rise of Robots: Entering the Decade of Robotics