Fear & Loathing in Voice Authentication - the Complete Story on Deepfakes

Fear & Loathing in Voice Authentication - the Complete Story on Deepfakes

And now, finally, the full list of 10 Key Ideas on (audio) Deepfakes, through the prism of voice authentication, appears in one place.

The final arrangement of ideas is intended to read as a kind of story.

Here goes -

1. Deepfake voices totally fool the human ear

We've finally lost the battle; the human ear cannot tell a deepfake voice from a real one, especially when you consider a call center scenario.

It might difficult to pinpoint exactly when the shift away from "uncanny valley" actually happened, but I will again reference a recent study published in Nature Communications to pin the year 2022 as the point when the AI finally won. 13% deepfake detection rate by humans, with an 8.6% false positive rate? And this was in 2022, when study participants were somewhat suspicious of the audio? Yeah, I've seen enough.

2. Fraudsters aren't using deepfakes for financial crimes in the call center (yet)

If fraudsters aren't using deepfakes yet, why not? While indeed tools to create realistic AI voices of people are widely available, it is still too complicated and effort-intensive for fraudsters to stage such attacks at scale.

And that is because it is still far too easy for fraudsters to commit financial crimes by simply targeting unsecured channels. There is such an abundance of totally exposed businesses, that fraudsters can't take advantage quickly enough!

What we continue to see, rather, is that the very presence of voice authentication in the call center is a major deterrent to fraudsters. They will simply hang up the call and try the next victim on the list.

At this organization's call center, most fraud attacks targeted accounts that were not secured by voice authentication

Regardless, it's clear that there is a plausible future in which the use of voice AI will be maliciously exploited by fraudsters seeking to impersonate all of us.

The next three Key Ideas outline the technological tools available to us today to help combat deepfakes, as well as guidelines on how to develop such tools effectively and safely.

3. Voice authentication is the first line of defense against deepfakes

When OpenAI announced their new Voice Engine, they also recommended "[p]hasing out voice based authentication as a security measure for accessing bank accounts and other sensitive information".

It's an unfortunate take, however well-intentioned, because the opposite is true: given that the human ear can't tell deepfakes from real voices, the data shows that a an up-to-date voice authentication system can be quite resilient to the majority of spoof attacks. (Reminder: spoof refers to the general category of impersonation attacks via technology, which includes deepfakes but also many other hardware/software tools).

In this real-world experiment, voice authentication technology was able to deter over 80% of attempted spoof attacks

Based on the above result, it's clear that we should keep this voice authentication thing around as long as it continues to perform.

It takes an AI to fight an AI, right?

An 80% resilience to spoof is far better than the human ear, but an error of nearly 20% is still scary enough to keep us worried.

4. Voice authentication and deepfake countermeasures must be built concurrently; you cannot build one effectively without also building the other

The following table shows us that the best recipe for deepfake defense is combining spoofing countermeasures with voice authentication. Even independently, however, each technology isn't doing too badly on its own, right? So why insist on the combination?

What the table isn't showing is that -

  • The spoof and voice authentication datasets have been cross-pollinated, allowing us to develop and choose the best technologies across our use cases - this improves the individual performance of the technologies

because we tested voice authentication with deepfake data, we naturally developed technology with strong resilience to deepfakes

  • By developing voice authentication and spoofing countermeasures together, you would choose a candidate technology based on how well it combines with the others

for either voice authentication or deepfake detection, we chose the technology that performs best in combination with the other; not based on its individual performance

And by extension, this is how you want to deploy these technologies in practice: all voice-based systems should be running both voice authentication and spoofing countermeasures concurrently.

And, unsurprisingly, the astute reader will have noticed that the type of data needed in order to run such a research and technology development program, one that takes a holistic view of voice authentication and deepfake detection, would have to be richer than any dataset used for either technology independently.

5. The datasets used to develop deepfake detection must be realistic, rich and abundant (probably in that order)

In our fifth article, I shared some guidelines on how deepfake detection researchers and technologists should approach dataset collection/creation. Does this actually work? In fact, all of the results I've shared with you so far were based on a real-world dataset, featuring realistic deepfake scenarios.

And if you were paying attention, these lessons aren't just for folks working on deepfake detection, but those working on voice authentication as well. Because the same realistic datasets should be used to develop both technologies. Everything is connected.

The 5th Key Idea states that datasets need to be realistic, rich and abundant, and "probably in that order."

That last bit is more due to how difficult it is to achieve realism in a simulated, or lab environment. I therefore recommend spending the most effort there.

Realism (or lack thereof) is also the single largest gap in the public datasets available today.

In fact, realism can be more important than the quantity of data, or more important than having all deepfake technologies represented in your data.

Relative improvement in deepfake detection accuracy when datasets used to train models are augmented by increasing amount of data (salmon), adding realistic elements such as telephony characteristics (grey) and doing both (gold)

Of course, we want to have both richness and abundance of data, covering as many key variables as possible. We want our AI models to be prepared for anything.

Reaching the halfway point of the 10 Key Ideas, we closed the chapter on the technologies that can defend us against malicious use of voice AI.

But what about everyone who actually works on Generative AI technology, and the technologies that are used to create lifelike voice clones of real people? What could they do to minimize the potential harm of misuse of their work?

6. Build voice AI (deepfake) technology responsibly

Earlier we saw that contrary to popular belief, voice authentication was actually a good deterrent for deepfakes. But why is that?

One of the reasons is that, when it comes to voice AI, what sounds good/real to the human ear is not necessarily enough to pass voice authentication scrutiny. Of course there is a Venn diagram, but it is not a complete overlap.

We posited that this meant that a path exists where the work to create realistic voice AI can be pursued, and technologies made available for legitimate use (with proper checks and controls, of course), without necessarily being in opposition to voice authentication.

From the point of view of voice authentication, we came up with three guidelines for researchers and technologists to develop voice AI more safely:

  1. Don't use speaker embeddings as part of synthetic speech technology
  2. Speaker similarity is not a helpful metric when developing voice AI; in fact it should be minimized
  3. Implement a robust watermark for all synthesized speech generated by your technology

More details are available in the original article.

7. Organizations seeking to test voice authentication and deepfake detection must adopt scientific methods

At this point we start to move away from discussion strictly on the technologies involved, and beginning to explore questions around how voice authentication and deepfake detection are used.

How does an organization, or even an individual, know that the voice-based defense systems are working? While we continue to see articles where journalists demonstrate how they've fooled legacy voice systems with the latest voice AI technologies, such examples are only slightly helpful at best, and dangerous at worst.

Remember - fraudsters would love nothing more than to have voice authentication systems discredited

Instead, we advocate for systematic, standardized methodologies for evaluating any security processes and systems, including voice authentication. And for technologies such as voice authentication and deepfake detection, which are probabilistic by nature, there are no shortcuts: you will need to adopt the same experimental methodologies as the scientific method.

This means larger-scale testing, emphasizing conditions that match real world scenarios, and encompassing a wide array of technologies and techniques across your use cases.

Use cases? What "use cases"? All this time, we've been discussing technologies and threats that have to do with people making phone calls?

But in the digital age, won't phone calls, and the use of our voices, become totally obsolete any day now?

8. The voice will remain a critical communication method between businesses and their customers

As we revealed, phone conversations, and the use of the human voice, has not gone anywhere. Instead, it has simply transformed as communication options have grown in the past 20+ years.

What we've seen during this transformation is that the voice channel has become increasingly used when people need help with complex requests, or need human support for sensitive tasks.

We argue that as such, the human connection between businesses and their customers is something that any business should value highly. A human connection is good for brand recognition, customer satisfaction, customer stickiness and, subsequently, it's good for business.

And if the voice channel is used to handle the most delicate and high-value tasks, then the need to authenticate users is more important now than ever before.

But- But- Agentic A.I.?!

9. Your agentic AI will need to authenticate you when you talk to them

Phone calls, email, chat, text messaging, social media websites, discord, etc - these are all just interfaces for communication.

If the promise of agentic AI is realized, it will become another major interface, perhaps often layered on top of any of the aforementioned ones. While debates raged about digital making telephony obsolete, agentic AI will make digital and every other interface "obsolete" (I don't actually believe this; agentic AI will just become a new row added to the chart in idea 8).

And we predict that the voice will remain a key modality for individuals to talk to their agentic AI personal assistants.

Using our voice will allow us to have our AI assistant perform tasks on our behalf, while we're occupied with something else (chopping vegetables, folding laundry).

The voice interface is also handy if we're at our workstation and focusing on that presentation deck or spreadsheet we're working on, and we had a question about a particular feature. How much better would your workflow and focus be if you could just point at the thing on your screen, and talk to the AI that will help you solve the problem?

Image source: Artem Podrez, pexels.com

And yep - if we're going to be talking to the AI personal assistant, and we're asking it do things like pay our bills, book doctor's appointments, and any variety of personal/sensitive tasks on our behalf, it should be able to tell that you are in fact you.

We set out three months ago with this series to demystify, perhaps reassure, and try to raise and comment on important questions when it comes to audio deepfakes and where voice authentication fits in.

The landscape in a post-Generative AI world has been frantically changing, and trying to keep up or even wrap our heads around the implications can have a dizzying effect.

Audio deepfakes, voice authentication, this is just one more item in a list of AI topics that has probably grown into the 1000's.

But if you, as an individual, are concerned about this particular problem, should you go along with using things like voice authentication? I've certainly spent a lot of time in this series trying to convince the reader of the merits of doing so.

Putting all that aside, the last, perhaps most important idea, is that the choice should be yours.

10. You, the people, must have control over how your personal data may be used by AI technologies


Luzi Sennhauser

Co-Founder & CTO at aurigin.ai | Entrepreneur | Ex-McKinsey

1 周

Very good thoughts. I am wondering indeed why fraudsters don’t use deepfakes today yet? Is it just an adoption problem or are success rates with traditional text-based scams still good enough? In any case, I suggest to check out aurigin.ai where we prepare for what the future of deepfakes are to bring us.

回复
Dayana Ribas

Speech and Audio Research & D+i | Chief Scientist at BTS (Business Telecommunication Services) | Collaborating Researcher with UNIZAR

2 周

Useful summary about deepfakes! Thank you for condensating all this updated information in a single newsletter, it is interesting for researchers and developers involved in Biometric and AI technologies ;)

Manar Alazma

Product Management Executive | Conversational AI | Generative AI | Contact Center AI | Voice Recognition | Customer Experience AI | SaaS Solutions | Product Innovation & Growth | Strategy & Execution

3 周

Very informative, thanks Haydar Talib

I haven't yet figured out what subsequent posts in this newsletter will be about. Probably responding to new developments in voice security and/or general topics in the AI space. In any case, it was a pretty big milestone to reach this point, completing the full series of 10 ideas on audio deepfakes. So I must emphasize that the series would not have been possible without the contributions from coauthors and guest authors throughout, namely: Tim Savage, Luis Buera, Héctor Delgado, Simone Onizzi, and Cliff Mann. Thank you, colleagues, for your valuable contributions and generosity of ideas.

要查看或添加评论,请登录

Haydar Talib的更多文章

其他会员也浏览了