登录查看更多内容

I cloned my own voice

Claudio Fantinuoli

CTO at KUDO Inc. | Researcher at Uni Mainz | Founder of InterpretBank | Speech Technology | PhD

发布日期: 2022年10月20日

Most building blocks of Artificial Intelligence are increasingly plug&play. This means they are accessible to anyone with a basic knowledge of programming (mainly in Python). This is one of the recent revolutions in the field I will never stop emphasizing. If a company makes a product out of it, and you can bet it will sooner or later, you don't even need these skills.

This also applies to voice cloning, i.e. the process in which one uses a computer to generate the voice of a real individual. What is amazing (and potentially scary, see below) is that the machine learning technology upon which voice cloning relies is becoming trivial and accessible to everyone. You can install user-friendly libraries to use the technology on your machine or to integrate it into your products. If you still do not have basic coding skills, you can look for a start-up that offers the service as a paid product. It's not quite as fun as to own and control the technology yourself, but it's okay too.

Not only is the technology available to everyone, but it only requires a small amount of data to work the magic. Why? Today, with almost any NLP task, it's common to reuse a general model (trained on lots of data) and fine-tune it for a specific task, in this case a general text-to-speech model that you refine and teach it the features of your voice. You don't want to reinvent the wheel every time!

So I wanted to try this process firsthand. It only took me 20-30 sentences reading aloud to produce a clone of my voice. I recorded them with a conventional built-in microphone. The results could have been improved by using high quality recordings and increasing the number of sentences (although I haven't had the time to experiment and figure out what would have been the right quantity to maximize quality). The whole process took me less than 15 minutes.

After fine-tuning the model, I asked to read-aloud an old tweet of mine and recorded the result. My English pronunciation is poor and the model did a good job of recreating this feature as well. You can listen to the generated voice here.

Sebastian Raschka, PhD 7 个月前

GPT-4: A Potential Stepping Stone on the Path to…

Data Science Dojo 1 年前

Watch#7: Small Tweaks with Big Impact

Pascal Biese 1 年前

?A company recently experimented with combining voice cloning and automatic text generation. The result is a funny podcast between Joe Rogan and Steve Jobs.

You can do fantastic things with this technology, such as easily creating synthesized voices for a brand, a game etc, but it can also be used to create deepfakes (see info here about audio deepfake). The malicious use of deepfakes may become an issue in the near future, and we need to increase awareness about this phenomenon in the general public.

Uro? PETERC

Advisor to CEO and Chief Interpreter

2 年

Coming back to the substance of that tweet of yours: love it, this is the real issue here, robots asking us to prove to them that we are indeed ?human. Is that a way of keeping robots in check? Or is it a way of teaching robots how to learn to recognize humans and keep us out of the equation? Somewhat of a gloom scenario, but that’s actually the philosophical essence behind the paradigm of a productive and constructive ?“conviviality” between man and machine. Add to that the age old adage of the student outperforming the teacher (where the robot is the student and the human is the teacher) and we’re up against quite a dilemma! ??

1 次回应

Andrea Caniato

2 年

technically impressive, but other than that: would you say this represents you? would you let this be the first impression you make on other people?

3 次回应

Giovanna Lester

Interpreter || Translator of Brazilian Portuguese x English - I help entrepreneurs, individuals, lawyers, and LSPS communicate.

2 年

The potential is scary, yes, but I could still tell it was not a human voice. Let's hope they do not improve it much more.

Andy Gillies

Conference Interpreter - French, German & Polish into English

2 年

Monika Kokoszycka... something for you?

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

I cloned my own voice

Claudio Fantinuoli

CTO at KUDO Inc. | Researcher at Uni Mainz | Founder of InterpretBank | Speech Technology | PhD

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

Five critical thoughts and a warning on “Situational Awareness: The Decade Ahead.”

??Top ML Papers of the Week

Introduction to Knowledge Graphs

Is OpenAI’s O1 Model a Scam? An In-Depth Look at the Debate

Understanding CoALA (Cognitive Architectures for Language Agents) Through a ReAct Agent Example Using LangChain

Formulation of Node Embeddings in Graphs: Node2Vec Algorithm - Part 6 of X of my notes

Explainable Language Models: Existing and Novel Approaches

LLM's Introduction

The 30,000 Foot Wave

Building a solution to combat Fake News with Machine-Learning

领英推荐

Interacting with Artificial Intelligence

2022年12月4日

GPT-3 and DALL-E 2 on Interpreting

2022年10月28日

Improving texts with AI

2022年8月21日

Facial emotion recognition may improve automatic speech translation

2022年1月9日