登录查看更多内容

Key Idea 3 – voice authentication is the first line of defense against audio deepfakes

Haydar Talib

发布日期: 2024年11月21日

In the world of voice biometrics authentication (I will interchangeably use voice authentication to mean the same thing), the hypothetical threat of deepfake/synthetic voice-based fraud has been well-known for over a decade. But the fear and loathing of deepfakes broke through to the public in 2023, wasting no time from the earlier demonstrations of hyperreal artificial voices to how they might affect the reliability of voice authentication. We can choose this?article published online by Vice UK?as the symbolic beginning.

Based on The Vice article and several other copycats (Guardian Australia,?Wall Street Journal), it would seem that the existence of deepfakes has made voice authentication obsolete. For now we will put aside the problematic and biased nature of such demonstrations, and instead spend our time sharing why the opposite conclusion is true.

As we take a more systematic approach to evaluating the vulnerability of a voice authentication system, we will come to realize that if we are indeed worried about the threat of audio deepfakes, then voice authentication is a powerful tool to shield us from them.

To set the context, a modern voice authentication system will typically provide a score evaluating whether the voice in a given audio clip matches the voice of the person the speaker is claiming to be. When people call regarding their own bank accounts, mobile service, etc, the score will generally be higher. And when fraudsters (using their own, plain voice) phone a call center to access the accounts of their victims, a high-quality voice authentication system will return lower scores.

In this example, Jane is speaking with her bank’s call center representative, who is able to verify that this is indeed the real Jane using voice authentication

For a given system, you would normally choose a cutoff point, or threshold, for these scores, on which to base automatic decisions confirming if each caller is, in fact, who they claim to be (and to refuse access to anyone else). The below graphic illustrates this concept.

The range of voice authentication scores for different types of individuals. Fraudsters (purple) will tend to score lower against the voice profiles of legitimate account holders (green). All cases appearing to the left of the threshold (dotted line) are rejected, whereas everything to the right will be deemed authentic

One can see that each type of person (legitimate account holder or fraudster) has their color-coded set of scores, which are not quite a Gaussian/bell-curve, but close. This graph shows the voice authentication performance, on a real-world dataset, of a modern system based on a common flavor of Deep Neural Network known as the ResNet. In this case, we see a very nice separation between fraudsters and legitimate speakers, so choosing a threshold separating the two (at a score of around 3.9) is straightforward enough; this allows us to authenticate 98% of legitimate persons while rejecting fraud attempts 99.9% of the time.

Recalling the internet headlines that would have us believe deepfakes could now fully circumvent voice authentication, let’s take a closer look at what’s happening behind the scenes. If we add the scores generated by deepfakes and other spoof attacks (e.g. recordings of the victim’s voice), it could look like the following graph.

The voice authentication scores for spoof audio (includes deepfakes and other technology-based impersonation methods) are now added, in orange, to the previously shown graph

We now see an example of how a fairly modern voice authentication system performs against spoof attacks. The spoof attacks of this particular experiment spanned several deepfake technologies, some of higher quality than others, in a real-world setting.

This voice authentication system was able to deter over 80% of attempted spoof attacks. On the one hand, the result shown here does not guarantee this level of performance for all systems, and it is likely that the highest quality deepfake attacks would have more success than lower-quality ones. On the other hand, by moving our threshold more to the right, we could increase the system’s robustness to deepfakes (at the cost of inconveniencing a greater portion of the legitimate population).

Regardless, the message is clear: voice authentication’s resistance to deepfakes is far better than what humans are able to achieve, even when expecting the presence of deepfakes in audio clips (see Key Idea 1). A recently published paper at Interspeech draws similar conclusions.

Evolution of voice authentication error over time. Authentication accuracy (purple) shown with respect to key voice authentication technologies, order from oldest to newest (left to right). Also shown is error with respect to deepfakes (orange) for the same technologies. Source: Jung et al, Interspeech 2024

As shown in the above graphic, authentication error improves over time as new types of systems are developed. Note the term EER (equal error rate), which is a common metric to assess biometric system accuracy (some general info here); intuitively, we want to minimize this type of error (zero would be nice).

While older voice authentication systems show some resilience to deepfakes, the error rates are still quite high, whereas their modern counterparts showcase much improved robustness to such attacks.

领英推荐

Social Media Isn’t Safe, But You Can Be More Cautious

Vodacom 1 年前

The Importance of Verifiable Identity: A Global…

Entrust 5 个月前

Taking a closer look at digital identity's existence…

Privy 2 年前

To situate ourselves in years, the i-vector was introduced in 2011, the x-vector in 2018, and WavLM-ECAPA is a contemporary (2024) system.

Voice authentication clearly provides a substantial level of defense against deepfakes, but it must be kept up to date. And, most importantly, research into new voice authentication models must continue; and we must continue with this second objective of being robust to audio deepfakes.

Though we must maintain a frantic pace to keep voice authentication systems up to date, the trend thus far in terms of advances in lifelike voice synthesis has been only a gradually rising threat against the voice systems. You can actually read this from above graph, if you follow the orange dotted line from right to left.

As we walk backwards, what we see is that the effectiveness of modern deepfake technology against older technology is now trending upward, but only gradually up to the x-vector, where suddenly there’s a large leap in error as we move to i-vector technology.

Fun fact: the Deep Neural Network (DNN) revolution first came to the world of voice biometrics in the form of x-vector technology (seminal paper here), and had a massive, positive impact on voice biometrics accuracy (and resilience to deepfakes).

So, with the exception of when DNNs first emerged, benefitting both voice biometrics and deepfakes in equal measure, all other deepfakes advances since have only been gradually improving in efficacy against older voice authentication systems.

Returning to the first results we shared in this article, now expanding the view to how older systems perform against modern-day deepfake attacks, we notice a similar pattern to that of the Interspeech paper.

Our own analysis of top voice authentication technology of the past few years, on our own real-world dataset. We notice a similar pattern to that of the Interspeech paper: newer systems outperform older ones when it comes to deepfakes

While slightly older systems can provide a strong baseline resistance to deepfakes (and other spoof attacks), newer systems are far more resilient (compare 70% resilience in 2019 vs 80%+ in 2024).

From a fraud perspective, however, based on what we’ve now seen, it does appear that their likelihood of circumventing a voice authentication system has increased from around 0.1% without the use of deepfakes, to around 20% if they were to start using deepfakes.

But this means that voice authentication alone can help prevent 80% of audio deepfake attacks, whereas humans are largely unable to detect deepfakes (see Key Idea 1)

Consequently, voice authentication becomes one of the few, key tools, to help us defend against audio deepfakes. It’s clear that deepfakes are a challenge to voice authentication systems, but for the time being, and through several AI revolutions over the past few years, the technology has held relatively strong.

So if we are truly concerned about how much audio deepfakes could be exploited by fraudsters to commit identity theft, financial crimes or other types of crimes, then we have to seriously consider, if not outright rush to, using voice authentication in more areas to secure sensitive interactions.

As long as the quality of deepfakes and other spoof-related technologies continues to improve, it could be dangerous in the long-term to rely solely on voice authentication technology as a deterrent. Therefore, next week, we will take a closer look at the deepfake countermeasures available to us today; we will see how these technologies complement voice authentication’s ability to resist (and even detect) deepfake attacks.

tldr; perhaps counterintuitively, a modern voice authentication system can help mount a powerful defense against deepfake (and other spoof) attacks, deterring 80% of them

I've attempted to share some important ideas on voice authentication's role in an increasingly deepfake world, ideas which run contrary to some headlines of the past couple years. But do you agree with the analysis and conclusions? Please let me know in the comments below!

10 Key Ideas on Deepfakes

266 位关注者

Deb H.

Director - Generative AI / Security Solutions - Customer Success at Microsoft

3 个月

Great article Haydar, appreciate your sharing this!

1 次回应

Gilson Arciprete

3 个月

great content... thanks for sharing!

1 次回应

Haydar Talib

3 个月

I would again like to thank some folks on my team for their feedback through the course of writing today's article, Luis Buera and Simone Onizzi. And a special thanks to Héctor Delgado for also creating some of the graphs.

1 次回应

查看更多评论

要查看或添加评论，请登录

Haydar Talib的更多文章

Fear & Loathing in Voice Authentication - the Complete Story on Deepfakes

2025年2月12日

Fear & Loathing in Voice Authentication - the Complete Story on Deepfakes

And now, finally, the full list of 10 Key Ideas on (audio) Deepfakes, through the prism of voice authentication…

9 条评论
Key Idea 10 - people should [REDACTED] privacy [REDACTED] consent to [REDACTED] AI

2025年1月31日

Key Idea 10 - people should [REDACTED] privacy [REDACTED] consent to [REDACTED] AI

Hopefully I can get away with a bit of humor in the heading of today's article, but make no mistake: I will (in some…

5 条评论
Key Idea 9 - your agentic AI will need to authenticate you when you talk to them

2025年1月23日

Key Idea 9 - your agentic AI will need to authenticate you when you talk to them

The title's a mouthful, but it's made up of several key points; let's unpack it. Last week my colleague Cliff Mann…

4 条评论
Key Idea 8 - the voice will remain a critical communications method between businesses and their customers

2025年1月16日

Key Idea 8 - the voice will remain a critical communications method between businesses and their customers

People still want to talk to people. It may seem an obvious statement to some, a mark of passé thinking to others.

3 条评论
Key Idea 7 - organizations seeking to test voice authentication and deepfake detection must adopt scientific methods

2025年1月9日

Key Idea 7 - organizations seeking to test voice authentication and deepfake detection must adopt scientific methods

I have invited Tim Savage and Simone Onizzi to share their insights and guidance for how organizations can approach the…

3 条评论
Key Idea 6 - build synthetic speech (deepfake) technology responsibly

2024年12月19日

Key Idea 6 - build synthetic speech (deepfake) technology responsibly

We started this series by outlining how audio deepfakes have already defeated the human ear. By design.

5 条评论
Fear and Loathing in Vancouver - NeurIPS in a Nutshell

2024年12月16日

Fear and Loathing in Vancouver - NeurIPS in a Nutshell

The first thing to note about #NeurIPS2024 is that there is so much of it. After a dizzying few days (and as I write…
tl;dr Five Ideas on Deepfakes in Five Minutes

2024年12月12日

tl;dr Five Ideas on Deepfakes in Five Minutes

If it's true that we should be concerned about the rise of deepfakes, then I have some bad news to share - 1. The human…
Key Idea 5 – the datasets used to develop deepfake countermeasures must be realistic, rich and abundant

2024年12月5日

Key Idea 5 – the datasets used to develop deepfake countermeasures must be realistic, rich and abundant

In Key Ideas 3 and 4, we laid the building blocks for the technological foundation that scientists and technologists…

1 条评论
Key Idea 4 – build your deepfake countermeasures alongside voice authentication

2024年11月27日

Key Idea 4 – build your deepfake countermeasures alongside voice authentication

Last week (Key Idea 3) I shared some data on voice authentication’s robustness to audio deepfakes, including how newer…

4 条评论

See all articles

Key Idea 3 – voice authentication is the first line of defense against audio deepfakes

Haydar Talib

领英推荐

10 Key Ideas on Deepfakes

266 位关注者

Haydar Talib的更多文章

社区洞察

其他会员也浏览了

Smile ID 2024 Wrapped

Veriff Times

The Agentic Era of AI-Powered Fraud

Generative AI and the Intensified Identity Fraud

Data preparation for Face Anti-Spoofing and Fraud Detection

Sounding the Alarm on the Financial Impact of Deepfakes

Does my Voice really Identify me?

Veriff Times

Inverid News July 2023

Gen AI Fraud—Secure Your Entire Identity Journey

领英推荐

10 Key Ideas on Deepfakes

266 位关注者

Haydar Talib的更多文章

Fear & Loathing in Voice Authentication - the Complete Story on Deepfakes

Key Idea 10 - people should [REDACTED] privacy [REDACTED] consent to [REDACTED] AI

Key Idea 9 - your agentic AI will need to authenticate you when you talk to them

Key Idea 8 - the voice will remain a critical communications method between businesses and their customers

Key Idea 7 - organizations seeking to test voice authentication and deepfake detection must adopt scientific methods

Key Idea 6 - build synthetic speech (deepfake) technology responsibly

Fear and Loathing in Vancouver - NeurIPS in a Nutshell

tl;dr Five Ideas on Deepfakes in Five Minutes

Key Idea 5 – the datasets used to develop deepfake countermeasures must be realistic, rich and abundant

Key Idea 4 – build your deepfake countermeasures alongside voice authentication

社区洞察

其他会员也浏览了

Smile ID 2024 Wrapped

Veriff Times

The Agentic Era of AI-Powered Fraud

Generative AI and the Intensified Identity Fraud

Data preparation for Face Anti-Spoofing and Fraud Detection

Sounding the Alarm on the Financial Impact of Deepfakes

Does my Voice really Identify me?

Veriff Times

Inverid News July 2023

Gen AI Fraud—Secure Your Entire Identity Journey