This is how your ML models are getting hacked...

This is how your ML models are getting hacked...

When Christopher Sestito was heading threat research at Cylance, he woke up to a troubling alert. “Our core intellectual property - our machine learning (ML) model could be compromised. That was the last thing we could have imagined,” he says.?

Cylance was in the business of protecting others, a pioneer and one of the first companies to deploy ML models in antivirus software. Started by a charismatic serial founder, Stu McClure after a near-death flight crash, the company had scaled rapidly. It got accolades from Gartner, Inc Magazine, Forbes and was named as the best endpoint product by SANS. At its peak, it employed over 700 people, had raised capital from some of the leading investors such as Khosla Ventures, Insight, Blackstone, KKR and the CIA’s investment arm, In-Q-Tel. It was an unstoppable rocket ship crossing $100 million in revenues and got acquired by Blackberry for $1.4 billion.

As Christopher, “Tito” to his friends, started to dig into the ML challenge, his busy days turned into sleepless nights. Working with a team of security navy seals, they uncovered something that was devastating, and could be staggering in scope. And it could impact every major sector.

To understand how ML models are being hacked, let us step back and look at the way we build these fantastically amazing machines. (Image credit: MITRE)

No alt text provided for this image

We start with training data, say lots and lots of pictures of a banana or a cat. The model is developed with such data and then it starts to figure things out quickly. It “infers” - or observes and draws a conclusion. You could show the ML model a picture of a banana and voila, it correctly says “That's a banana.”?And such models have become a mainstay in all major sectors you could imagine. While autonomous vehicles are leading the charge with AI revenues, patient data, financial services and such are not too far behind.?The top 5 categories generate over $30 billion in revenues, so it's a meaningful size and scale of market opportunity.?Put it differently, hackers are gonna hack these ripe targets. As Tito told me correctly, “These ML attacks can destroy multi-million dollar investments, delay product releases, and leave victim organizations legally and financially liable.”?

No alt text provided for this image

This is how they hack it….

From what we know today, and thanks largely to Tito and his team for the yeoman work, the hackers have figured out quite a few ways to screw up the ML models. If you are like me wondering why on earth would they even want to attack the models, read on.?If we start with the “why”, we find these major motivations:

  • Bypass the model - how to bypass your security ML models to steal the data is a new art form.?
  • Steal your model - a hedge fund trading algo would be a great thing, you know?
  • Screw up your model - create disruption, hurt your revenues and business.??

And the ways, the ML models are attacked include:

Bypass / evade the model: Attackers are masters - to break into Fort Knox, you need the layout plans. Maps, sensors, guards and such. Once they know the patterns, they can figure out the evasion techniques. In bypassing ML models, they throw in false data and the training model adjusts to the false data samples. Using such a tactic, researchers at Palo Alto Networks show how its ML detectors could be bypassed. In other examples, Cylance models were researched for scoring, all its 7000 feature vectors, weights - all of which were bypassed. Evil malware became friendly files.??

Steal the model: By thinking in “inversion'', hackers can replicate the model itself. To steal your ML model, the hacker goes into a meta mode. They study your ML model, gain access to training data, output APIs and recreate their own model. In one example, researchers were able to replicate Open-AI’s GPT-2, a 1.5 billion parameter model (including all the weights) using only $50K of Tensorflow cloud resources.?This fascinating article tells you more. Nerd out.?

Poisoning: Is that a banana or a toaster? Well, by putting a poisoned image, my ML thinks it's a toaster. Crazy indeed. This is just fun and games till things get serious. If your autonomous vehicle vision systems start to make such confusions, can people’s lives be at risk? How about facial recognition systems and your Apple Face ID??

No alt text provided for this image

In China, a group of scammers hacked government run identity systems, created shell companies, faked tax invoices and collected upwards of $70 million. Not bad for a few months of work, man, these ML hacks can be lucrative.

Protecting yourself from ML attacks…

There are no silver bullets and no automagical solutions. It begins with a blended approach of the human mind and the machine.

Step one - understand the adversary’s tactics: Researchers at Microsoft, Azure, Palo Alto Networks and even Kaspersky Labs are leading the charge in sharing tactics. MITRE has developed the Adversarial Threat Landscape for AI Systems (ATLAS) , a collaboratively developed knowledge base of adversary tactics, techniques along with over a dozen case studies. Tito and his team at Hidden Layer are actively helping build these knowledge frameworks. In this eye-opening video, MITRE’s Dr. Christina Liaghati and Jonathan Broadbent talk about AI Threats & Vulnerabilities and real-world observations. I have drawn much inspiration for this post from their fantastic work.

Step two - empower your data team: Your data scientists will be forced to study the basics of security. This is no fun, but we don't have a choice. Should your ML inputs be publicly exposed? How do they detect / track poisoning of data sets? When models go haywire, it is too late game over.?

Step three - come help shape the solution: How can a startup serve you better? Tito and the Hidden Layer team are building this new category called ML Detection and Response (MLDR). Connect with the founding team - Tito, Tanner Burns, James Ballard - share your wish list, your feedback - prioritize what is important for you - and become a design partner to gain early access to MLDR. The company announced its funding today led by 1011 Ventures and is primed for growth. I’m delighted to be a small part of this fun journey.?

When we band together, our knowledge and feedback helps, our cyber immunity becomes stronger.?As an investor in this company, this may sound self-serving but sure - I am a part of the solution. Are you??

ARKAR Nyan Hein

Technical Support Engineer

2 年

I agree. Nothing beats the human mind. I think ML is quite overrated. ML is made of a set of algorithms (despite being part of fuzzy logic) and if there is logic there is counter logic to it :)

Brad LaPorte

Gartner Veteran | Keynote Speaker | GTM Advisor to Startups, Private Equity & Venture Funds | M&A | Board Advisor | Expert Witness

2 年

?? ??

Sid Trivedi

Partner at Foundation Capital

2 年

Congrats Chris, Tanner, James and Mahendra!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了