I cloned my own voice

I cloned my own voice

Most building blocks of Artificial Intelligence are increasingly plug&play. This means they are accessible to anyone with a basic knowledge of programming (mainly in Python). This is one of the recent revolutions in the field I will never stop emphasizing. If a company makes a product out of it, and you can bet it will sooner or later, you don't even need these skills.

This also applies to voice cloning, i.e. the process in which one uses a computer to generate the voice of a real individual. What is amazing (and potentially scary, see below) is that the machine learning technology upon which voice cloning relies is becoming trivial and accessible to everyone. You can install user-friendly libraries to use the technology on your machine or to integrate it into your products. If you still do not have basic coding skills, you can look for a start-up that offers the service as a paid product. It's not quite as fun as to own and control the technology yourself, but it's okay too.

Not only is the technology available to everyone, but it only requires a small amount of data to work the magic. Why? Today, with almost any NLP task, it's common to reuse a general model (trained on lots of data) and fine-tune it for a specific task, in this case a general text-to-speech model that you refine and teach it the features of your voice. You don't want to reinvent the wheel every time!

So I wanted to try this process firsthand. It only took me 20-30 sentences reading aloud to produce a clone of my voice. I recorded them with a conventional built-in microphone. The results could have been improved by using high quality recordings and increasing the number of sentences (although I haven't had the time to experiment and figure out what would have been the right quantity to maximize quality). The whole process took me less than 15 minutes.

After fine-tuning the model, I asked to read-aloud an old tweet of mine and recorded the result. My English pronunciation is poor and the model did a good job of recreating this feature as well. You can listen to the generated voice here.

?A company recently experimented with combining voice cloning and automatic text generation. The result is a funny podcast between Joe Rogan and Steve Jobs.

You can do fantastic things with this technology, such as easily creating synthesized voices for a brand, a game etc, but it can also be used to create deepfakes (see info here about audio deepfake). The malicious use of deepfakes may become an issue in the near future, and we need to increase awareness about this phenomenon in the general public.

Uro? PETERC

Advisor to CEO and Chief Interpreter

2 年

Coming back to the substance of that tweet of yours: love it, this is the real issue here, robots asking us to prove to them that we are indeed ?human. Is that a way of keeping robots in check? Or is it a way of teaching robots how to learn to recognize humans and keep us out of the equation? Somewhat of a gloom scenario, but that’s actually the philosophical essence behind the paradigm of a productive and constructive ?“conviviality” between man and machine. Add to that the age old adage of the student outperforming the teacher (where the robot is the student and the human is the teacher) and we’re up against quite a dilemma! ??

technically impressive, but other than that: would you say this represents you? would you let this be the first impression you make on other people?

Giovanna Lester

Interpreter || Translator of Brazilian Portuguese x English - I help entrepreneurs, individuals, lawyers, and LSPS communicate.

2 年

The potential is scary, yes, but I could still tell it was not a human voice. Let's hope they do not improve it much more.

回复
Andy Gillies

Conference Interpreter - French, German & Polish into English

2 年

Monika Kokoszycka... something for you?

要查看或添加评论,请登录

社区洞察

其他会员也浏览了