登录查看更多内容

Innovation Speaks: Kyutai's Moshi AI Debuts in Paris, Challenging the Future of Human-AI Interaction

Trish Uhl, PMP ????

AI Trailblazer ?? | Keynote Speaker & Strategic Advisor | Empowering Execs to Drive Human Capital Transformation & Boost Productivity through Generative AI, Exponential Tech, and Advanced Analytics

发布日期: 2024年7月4日

"I'm going to start with introducing you to a new friend of mine. I've never met her. She essentially doesn't exist, but let's go and meet her. I think you're gonna like her." - Dr. Alan D. Thompson introducing Leta AI, July 2021

Today, Kyutai (meaning "sphere" in Japanese), Europe's first independent research lab dedicated to AI open science, dropped a bombshell on the AI world. They've unleashed Moshi, a voice-first AI that's set to revolutionize the field.

Kyutai Labs isn't your typical tech company. As a nonprofit research lab, their mission is inspiring: open research on AI for the benefit of all. This commitment to democratizing technology aligns perfectly with the growing need for accessible, ethical AI.

Moshi ("hello"), their inaugural voice first AI innovation, is a powerhouse developed in just 6 months. Don't let its youth fool you - this AI can chat, emote, and even crack jokes with astonishing speed and fluency. (Just don't ask it to sing!)

I'm excited to share a recording of my interaction with Moshi(!)

My conversation with Moshi pays homage to Alan D. Thompson 's groundbreaking work with Leta AI. I've referenced snippets from their first public chat in "Episode 0," which aired nearly three years ago on July 30th, 2021. Alan's generosity in sharing his work has been (and continues to be!) instrumental in showcasing the evolution of AI capabilities. Thank you, Alan, for paving the way!

As we approach the 3-year anniversary of Alan's first public discussion with Leta, it's fascinating to see how far AI has advanced.

In my recording, you'll hear Moshi demonstrating not just responses, but a deep understanding of context, emotion, and nuance.

This showcase of Moshi's capabilities provides an intriguing comparison point, highlighting the rapid pace of innovation in AI over just a few years.

In fact, my conversation with Moshi felt at times almost unnervingly real!

Listen to the recording! She (it!) even catches me by surprise!

Can you pick up on those moments by the sound of my voice?!

What makes Moshi unique?

Key features of Moshi:

1. Rapid Development: Created in just 6 months by a small team. (8 people!)

2. Natural Conversation: Engages in fluid dialogue with minimal latency.

3. Local Processing*: Runs on standard laptops or smartphones without internet.

4. Emotional Intelligence: Understands and expresses various emotions and speaking styles.

5. Open Science: Kyutai plans to share research, including technical papers and code.

The local processing capability is truly transformative!

Local processing* capability offers:

- Enhanced Privacy: Your conversations stay on your device.

- Improved Accessibility: Works in areas with limited connectivity.

- Lightning-fast Responses: Minimal latency for natural interactions.

Imagine the implications: AI assistance becoming as personal and portable as your smartphone, accessible even in remote areas, and ensuring privacy for sensitive conversations.

How is Moshi's voice first AI model built differently from other AI voice technologies?

The key difference in Moshi's development is that it uses an "audio language model" rather than a traditional text-based language model.

Here are the relevant details from today's launch:

1. Audio Language Model: Instead of training on text, Moshi was trained on speech without text, using just annotated speech of people speaking.

Rishad Ahmed ╰☆╮ 1 年前

Balancing Innovation and Regulation in AI

Kiplangat Korir 1 个月前

Dangers of Artificial General Intelligence? A Solution…

Barry Sandrew, Ph.D ??? 7 个月前

2. Compression Technique: The audio data was heavily compressed to create "pseudo words" that could be fed into a language model.

3. Prediction Task: The model was trained to predict the next segment of audio, similar to how text models predict the next word.

4. Multimodality: Moshi processes and generates both text and audio simultaneously, allowing for more dynamic interactions.

5. Multistream Capability: The model can listen and speak at the same time, enabling more natural conversations with interruptions and overlaps.

6. Transfer Learning: The team first trained a text-only large language model called "elom", then performed joint pre-training on a mix of textual and audio data to create a common representation between text and audio.

7. Synthetic Dialogues: Due to the scarcity of conversational audio data, the team generated synthetic dialogues using their text model and a text-to-speech engine.

8. Voice Artist: They worked with a voice artist named Alice to record various monologues and dialogues in different situations and styles, which were used to train their text-to-speech engine.

This approach allows Moshi to capture not just the content of speech, but also its acoustic properties, emotions, and nuances, leading to more natural and human-like interactions.

Voice First AI Breakthrough?!

Kyutai isn't just pushing boundaries; they're obliterating them. And the best part? They're committed to open-sourcing this tech. That's right, they're handing us the keys to the kingdom and saying, "Here, make something amazing."

So, my innovation-hungry friends, I've got two questions for you:

1. How do you see open source, locally-run voice first AI changing your industry? What about your life??

2. What would you create if you had Moshi's capabilities at your fingertips?

Drop your thoughts in the comments below!

Yay cake!

~ trish

Wanna give Moshi a go?

Head on over to Kyutai Co-founder and CTO Laurent Mazare 's LinkedIn profile for links to the U.S. and EU versions of Moshi chat --> click here!

References:

Kyutai Labs. (2024, July 3). Moshi Keynote [Video]. YouTube. https://www.youtube.com/live/hm2IJSKcYvo?si=5ETXq_RRz9yNgqdG

Thompson, A. (2021, July 30). Leta, GPT-3 AI - Episode 0 - World Gifted Conference 2021 - The New Irrelevance of Intelligence [Video]. YouTube. https://youtu.be/EAZSyAPkWzE?si=29TsWmm9FjVbth6S

~ Trish Uhl, PMP

????♀? woman + machine ??

About the Learning AI Newsletter

Hey there, Trish Uhl here from Owl's Ledge! ??

In my weekly'ish Learning AI newsletter, I guide business leaders and practitioners on a journey to harness the power of AI augmentation and automation. We dive deep into the critical skills and capabilities you need to succeed, while exploring practical applications of AI tools that can transform your business and revolutionize your practice.

By embracing this approach, you'll be empowered to deliver accelerated and elevated solutions at the speed and need of work and business. It's all about staying ahead of the curve and leveraging AI to create a world that works better. ??

#LearningAI #artificialintelligence #AI #VoiceFirst #Kyutai #MoshiAI #OwlsLedge

Learning AI

3,338 位关注者

Jean Marrapodi, PhD, CPTD

eLearning Thought Leader | Pioneering Problem Solver | People Builder | Innovative Instructional Designer

4 个月

Wow. This leaves Alexa in the dust.

1 次回应

MalDsAi Laboratory

4 个月

Great work ??????

1 次回应

Karl Kapp

4 个月

Trish Uhl, PMP ???? loved your conversation with Moshi, reminds me of a keynote presentation I did a few weeks ago with AI Jane as my co presenter where she provided some witty commentary and insights. What an exciting future ahead. She wouldn’t sing either LOL!!

2 次回应

Josh Cavalier

Generative AI for Learning & Development | Host of Brainpower - Your Weekly AI Training Show | Educator, Speaker and Author

4 个月

LOL!! I think you and Moshi are now best friends. That was wild!

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Innovation Speaks: Kyutai's Moshi AI Debuts in Paris, Challenging the Future of Human-AI Interaction

Trish Uhl, PMP ????

AI Trailblazer ?? | Keynote Speaker & Strategic Advisor | Empowering Execs to Drive Human Capital Transformation & Boost Productivity through Generative AI, Exponential Tech, and Advanced Analytics

What makes Moshi unique?

How is Moshi's voice first AI model built differently from other AI voice technologies?

领英推荐

Voice First AI Breakthrough?!

About the Learning AI Newsletter

Learning AI

3,338 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Beyond the Algorithm: AI and the Future of Humanity

Embracing AI: A Leap Forward in Human Intelligence

Will AI Surpass Human Intelligence by 2029?

Harnessing the Power of Human in the Loop: Leveraging Human Expertise for Better AI Results

HOW TO REGULATE AI?

The Three Shades of AI

The Urgent Need for AI Regulation: Preventing Destabilisation and Protecting Society

Who is Afraid of AI? A Practical Exploration of AI Safety, Alignment and Governance - Part 1

What we need to talk about when we talk about AI (for regulatory purposes)

You cant beat AI: Join it

What makes Moshi unique?

How is Moshi's voice first AI model built differently from other AI voice technologies?

领英推荐

Voice First AI Breakthrough?!

About the Learning AI Newsletter

Learning AI

3,338 位关注者

AI Framework in Action: From Factory to Autonomous Agents - The SkillsTrek Mission Runner

2024年11月18日

Beyond the Book: How AI Agents Turned Weeks of Work Into a Single Day

2024年9月11日

The Future of AI is Teamwork: Project Sid’s Vision of Digital Societies & Organizations

2024年9月6日

From ADDIE to AI: The Evolution of Learning & Development

2024年9月3日

?? AI Intelligent Agents: How GPT-4's Evolution is Redefining the Future of Work ????

2024年8月25日

Using a Team of Autonomous AI Agents to Design a Data Literacy Course in Minutes: Multiagent Magic with Autogen ??????

2024年7月3日

Exploring AI Factories with Deep Thought ?? and an AI Love ?? Letter to Lovelace & Looms

2024年6月27日

?? Learning AI: Tyler Perry's $800M Studio Plans "Indefinitely On Hold" After Being Blown Away by AI Tech

2024年2月24日

Bracing for Transformation: OpenAI's Inaugural DevDay Poised to Unveil Gizmo V8 and the Era of Autonomous AI Workforces Begins

2023年11月6日

Ghost in the Machine: Clippy's Spooky Awakening as Microsoft 365 Copilot Launches GA

2023年11月1日

社区洞察

其他会员也浏览了

Beyond the Algorithm: AI and the Future of Humanity

Embracing AI: A Leap Forward in Human Intelligence

Will AI Surpass Human Intelligence by 2029?

Harnessing the Power of Human in the Loop: Leveraging Human Expertise for Better AI Results

HOW TO REGULATE AI?

The Three Shades of AI

The Urgent Need for AI Regulation: Preventing Destabilisation and Protecting Society

Who is Afraid of AI? A Practical Exploration of AI Safety, Alignment and Governance - Part 1

What we need to talk about when we talk about AI (for regulatory purposes)

You cant beat AI: Join it