tl;dr Five Ideas on Deepfakes in Five Minutes

tl;dr Five Ideas on Deepfakes in Five Minutes

If it's true that we should be concerned about the rise of deepfakes, then I have some bad news to share -

1. The human ear cannot tell a deepfake voice from a real one

If deepfakes are going to be weaponized against us, we don't stand a chance on our own. We saw that, optimistically, the human ear can only detect about 1 in 8 deepfakes, but at the same time thinks that 1 in 12 real voices are deepfake!

On the other hand -

2. Fraudsters aren't using deepfakes for financial crimes

I invited my colleague, fraud expert Tim Savage to give us an overview of the realities of how fraud is actually committed in the real world, and how (if at all), deepfakes factor in.

The news is that it's much easier for fraudsters to attack institutions using traditional techniques, such as social engineering and answering knowledge-based questions, than to resort to anything complicated such as deepfakes. This is good news because it means deepfakes remain a largely hypothetical threat; it's bad news because it means that we're not protected very well in the first place, and that's the problem we should really tackle.

Perhaps my favorite part of Tim's article is that fraudsters hate voice authentication technology, and do everything they can to stay away from it!

Most fraud attacks at this organization targeted accounts that were not secured by voice authentication
Voice authentication is fraudster repellent

Since we started this series a few weeks ago, there continue to be articles and warnings alerting the public around the possible dangers of deepfakes, most recently from the FBI and the BBC. There is a fine line between being aware of a subject, and being seriously worried (panicked) about it.

For now the topic of deepfakes remains in the former category: let's all try to learn about it and refresh our habits as they relate to potential fraud scams, but no need to lose sleep over it.

But if fraudsters started using these true-to-life Generative AI deepfakes, then voice authentication wouldn't be any use, right? Wrong -

3. Contrary to what some might think, voice authentication is in fact your first line of defense against audio deepfakes

In one graphic, we can see that a (modern) voice authentication system naturally rejects over 80% of attempted spoof (deepfakes and other impersonation technologies) attacks.

But we need to emphasize that a modern, up to date voice authentication system is required.

Furthermore, your voice authentication needs to be built with the deepfake/spoof use case in mind; which is why -

4. Voice authentication and deepfake detection should be built together

Joined by Héctor Delgado , this article is loaded with new information as I begin to share some key behind-the-scenes ideas on how to approach building technology in the era of Generative AI.

Anyone doing work, or offering a product that was developed, in a siloed fashion (exclusively voice authentication or exclusively deepfake detection) is likely not ready to have their technology function in the real world.

If you're building a voice authentication system, you need to also build spoofing countermeasures
Similarly, if you're building spoofing countermeasures, you need to also build voice authentication technology

The following table shows us that the best recipe for deepfake defense is combining spoofing countermeasures with voice authentication. Even independently, however, each technology isn't doing too badly on its own, right? So why insist on the combination?

What the table isn't showing is that -

  • The spoof and voice authentication datasets have been cross-pollinated, allowing us to develop and choose the best technologies across our use cases - this improves the individual performance of the technologies

because we tested voice authentication with deepfake data, we naturally developed technology with strong resilience to deepfakes

  • By developing voice authentication and spoofing countermeasures together, you would choose a candidate technology based on how well it combines with the others

for either voice authentication or deepfake detection, we chose the technology that performs best in combination with the other; not based on its individual performance

And by extension, this is how you want to deploy these technologies in practice: all voice-based systems should be running both voice authentication and spoofing countermeasures concurrently.

5. The datasets used to develop deepfake detection must be realistic, rich and abundant (probably in that order)

Our fifth article delivered perhaps the most controversial message (among others): a scathing review of the state of deepfake detection technologies in academic research (and industry as well, most likely).

I highlighted how a massive gap between academic datasets and the real world created a blind spot where, for years, the spoofing countermeasures boasted 99% accuracy in deepfake detection, for all of it to come apart catastrophically in the real world.

I also shared some guidelines on how deepfake detection researchers and technologists should approach dataset collection/creation. Does this actually work? In fact, all of the results I've shared with you over the past five articles were based on a real-world dataset, featuring realistic deepfake scenarios.

And if you were paying attention, these lessons aren't just for folks working on deepfake detection, but those working on voice authentication as well. Everything is connected.

I wrote above that datasets need to be realistic, rich and abundant, and "probably in that order."

That last bit is more due to the level of difficult, so I recommend spending most effort there: realism is super hard to achieve, and is the single largest gap in public datasets available today.

In fact, I will leave you with one last data-related statement and its supporting graphic (not seen before!)

Realism can be more important than the quantity of data, or more important than having all deepfake technologies represented in your data.

Relative improvement in deepfake detection accuracy when datasets used to train models are augmented by increasing amount of data (salmon), adding realistic elements such as telephony characteristics (grey) and doing both (gold)

The correct answer is you want to do both things.

But if we want to judge the relative contribution of adding a lot more data versus adding a bit more highly realistic data, the latter clearly wins out in this case; not necessarily because it is objectively more important, but certainly because it has been sorely missing in the world of deepfake detection.

要查看或添加评论,请登录

Haydar Talib的更多文章

社区洞察

其他会员也浏览了