登录查看更多内容

Text to speech is getting quite a bit better

Daniel Feldman

Cloud Security Architecture - Open Source - Zero Trust

发布日期: 2023年7月31日

Remember in the movie "2001: A Space Odyssey," where HAL 9000 sings the song "Daisy, Daisy" in a robotic voice as it is being taken apart?

That was a reference to a real event! In 1961, Bell Labs developed software for an IBM 704 computer to sing the same song. This wasn't just a recording -- it was using a vocoder to synthesize individual syllables to sing. This made headlines around the world as the first "talking computer" and still lives on in Kubrick's celebrated movie.

Since then, you've probably gotten used to computers being able to speak, at least a little. Robotic voices on call center lines or from home assistants have become pretty commonplace. They're understandable and have gotten gradually better over time. But no one would mistake them for real people.

But now, the same transformer-based generative AI techniques that lead to ChatGPT and Midjourney are leading to realistic human voices. They can match intonation, emotion and expression similar to a human. They can speak any language and pronounce uncommon words and names. They can mimic voices based on hearing only short samples Perhaps because AI is blowing up everywhere, new text-to-speech isn't getting so much attention.

Some of the new text-to-speech systems include:

Nicholas Meyler 1 个月前

Will AI replace the Educator?

Paul Major 8 个月前

Making a Book in 72 Hours

Betsy Tong 4 个月前

Tacotron (open source)
Tortoise (open source)
Coqui (open source)
Eleven Labs
Play.ht

Just for fun, I've been using Coqui to translate some long books into audiobooks. (I can't share them, because they're copyrighted.) It's a clunky process right now and takes a few days on my Mac to do a decent-sized book, but it works.

The opportunities to make life better for vision-impaired people, in particular, are tremendous.

What else can we make that computers can talk almost as well as humans?

要查看或添加评论，请登录

查看全部

Text to speech is getting quite a bit better

Daniel Feldman

Cloud Security Architecture - Open Source - Zero Trust

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

Can we replace time in our equations? chatGPT answers.

AI Never Used To Understand Jokes - Now It Does

The Unseen Treadmill

Speech 2.0: The Next Frontier of Large Language Models and Applications

Making Conversation: Agentless Contact Centers, GPT-4o and Call Containment

Envisioning an Inclusive and Technology-Driven Future with Large Language Models

Are humans the engineers of prompts or are prompts the engineers of humanity?

The Stylemate – Bullshit Machine? Edition 03/2023 | HOT OFF THE PRESS

Is ChatGPT a modern-day atomic bomb?

Everything but technology with Nat - Vol II

领英推荐

What can GPT-4V do?

2023年10月7日

Why aren't companies using Generative AI yet?

2023年9月6日

Use ChatGPT as a reality check

2023年7月28日

The single best tip I have for public speaking

2023年7月27日

19 Tips for Coding Faster with ChatGPT

2023年7月25日

Sentima at Kubecon Europe 2023

2023年5月2日

Data Science Superheroes Come to the Rescue

2018年1月26日

Timing is Everything: Understanding the Meltdown and Spectre Attacks

2018年1月4日

Autonomous Cars and Amara's Law

2017年2月10日

Get Ready for MinneAnalytics Big Data Tech!

2016年6月3日