Protecting Against Audio Deepfakes: Innovative Solutions and Ongoing Challenges
MOOR STUDIOS/GETTY IMAGES

Protecting Against Audio Deepfakes: Innovative Solutions and Ongoing Challenges

New Techniques Emerge to Stop Audio Deepfakes

Voice cloning, a technology that uses AI to create realistic-sounding speech, can be very beneficial. For instance, it can help people with speech impairments by generating synthetic voices for them. However, it also has a dark side. Scammers can use AI to imitate someone's voice and trick people or companies into giving them money. Additionally, voice cloning can be used to create fake audio recordings that spread false information during elections.

To tackle the growing threat of audio deepfakes, the U.S. Federal Trade Commission (FTC) started a Voice Cloning Challenge. Participants from universities and companies were asked to come up with ways to prevent, detect, and assess the misuse of voice cloning. In April, the FTC announced the three winning teams, each of which had a unique approach to solving the problem. This shows that dealing with the dangers of audio deepfakes requires diverse and collaborative efforts.

3 Ways to Tackle Audio Deepfakes

One of the winning teams, OriginStory, focuses on verifying voices right from the source. "We've created a new type of microphone that can confirm if recorded speech is genuinely human as soon as it's produced," explains Visar Berisha, an electrical engineering professor at Arizona State University, who leads the development team with fellow ASU faculty members Daniel Bliss and Julie Liss.

OriginStory's specialized microphone records sound like a regular microphone, but it also includes sensors that detect and measure body signals produced when a person speaks, such as heartbeats, lung movements, vocal-cord vibrations, and movements of the lips, jaw, and tongue. "This verification is added to the audio as a watermark during recording, giving listeners proof that the speech is human-generated," says Berisha.

Another winner, AI Detect, aims to use AI to identify fake AI-generated voices. Developed by OmniSpeech, a company specializing in AI-powered speech-processing software, AI Detect plans to integrate machine learning algorithms into devices like phones and earbuds. This would allow these devices to recognize AI-generated voices in real time, even with limited processing power. "Our goal is to have a feature on your phone or headset that can alert you if the voice on the other end is not real," says OmniSpeech CEO David Przygoda.

The third winning entry, DeFake, is another AI tool that makes it harder to precisely clone a human voice. DeFake works by adding tiny disruptions to a voice recording. "Think of these disruptions as small scrambling noises added to a human voice recording," explains Ning Zhang, an assistant professor of computer science and engineering at Washington University in St. Louis. "These noises help AI recognize the unique characteristics of a human voice, so when AI tries to learn from the recorded sample, it gets confused and learns something incorrect.

Zhang explains that DeFake uses a method called adversarial AI, which is a defensive strategy that disrupts how an AI model functions. "We're adding small bits of interference to disrupt the AI used by those trying to clone our voices," he says.

Implementing Audio Deepfake Defenses:

Both AI Detect and DeFake are still in the early stages of research and development. AI Detect is currently just an idea, while DeFake needs improvements to become more efficient. Przygoda and Zhang acknowledge the challenges of using AI for defense.

Przygoda says, "We'll need to continuously update our datasets and technology to keep up with new deepfake models and hardware. This will require constant monitoring."

Zhang agrees, adding, "AI is advancing rapidly, so we must constantly adjust our techniques to stay ahead. As defenders, we don't always know what AI models attackers are using, so we need a general defense against all attacks while maintaining voice quality, which is very challenging."

Meanwhile, OriginStory is in the testing phase, working to make the technology foolproof. "We're conducting a validation study with many users trying to trick the system into thinking there's a human behind the microphone when there isn't. This will help us understand how robust our technology is. It's crucial to be certain that the person on the other end is truly human," says Berisha.

Nauman Dawalatabad, a postdoctoral associate at MIT’s Computer Science and Artificial Intelligence Laboratory, finds AI Detect’s approach promising because it operates on the device, ensuring privacy by not sending personal data to company servers.

He also sees DeFake's method, similar to watermarking, as a good way to protect consumers from fraud if their speech data is intercepted. However, he points out that this approach requires knowing all the source speakers and careful implementation, as rerecording the watermarked speech with another device can remove the watermark.

For OriginStory, Dawalatabad believes its method of stamping at the source using biosignals is more robust than software-based watermarking, as biosignals are hard to replicate.

Dawalatabad suggests that the best way to combat audio deepfakes is with a four-pronged approach combining multiple strategies. First, watermark new audio recordings to make them traceable. Second, develop better detection models, which are essential for securing current data that isn’t watermarked. Third, deploy these detection models directly on devices to enhance security and privacy, including better model compression for resource-limited devices and integrating these models at the system level by manufacturers. Finally, he emphasizes the importance of engaging policymakers to ensure consumer protection and promote solutions.

The three winners of the FTC’s Voice Cloning Challenge will share a total prize of $35,000. Additionally, Pindrop, an information security company, received a recognition award for its solution that detects audio deepfakes in real time by analyzing speech in 2-second intervals and flagging potentially suspicious audio.

Reference: IEEE Spectrum by Rina Diane Caballar

要查看或添加评论,请登录

社区洞察

其他会员也浏览了