登录查看更多内容

Dan's Data Notes: Can a ML Model be stolen?

Dan F.

Product at Chainguard

发布日期: 2023年7月17日

Overview

When you first read the post's title, you may think about physically stealing a robot or many others. However, in a world where many devices are always on a network, knowing the code running inside is more critical than having the device itself. The software gives life to our smart devices, and algorithms do the same for artificial intelligence systems.

Before we get too far into the future, let's analyze an example of how knowing the system's inner workings can help bypass it. Most countries have toll roads that their citizens often use for faster access to areas paying a premium. When technology evolved and toll booths started charging customers automatically, some reverse-engineered the system's rules to their advantage. Since most of the automatic toll booths flashed at the cars passing by to obtain good-quality images, people soon realized that affecting how the pictures were captured could bypass the established control.

Sure enough, over t developed multiple techniques to make over time their license plates less visible when passing the toll. Similarly, it has been proven possible to steal machine learning algorithms' model configurations and training data that power business applications.

Why steal the model in the first place?

There are at least two big reasons why someone would like to steal a predictive model:

Sensitive Training data: The value of data has been heralded as strategic for organizations, and sensitive training data used for models is even more valuable. Obtaining a competing model's training data characteristics can allow the attacker to replicate the model.
Value of the model and its Configurations: Similarly, stealing the model parameters or hyperparameters can prove valuable for cybercriminals or competitors trying to emulate or bypass the predictive capabilities.

Why is it possible?

Machine learning algorithms today are exposed via APIs. Companies can monetize said algorithms and their valuable training data by allowing third parties and external developers to embed these algorithms in their applications. For example, Google, Amazon, and Microsoft expose translation APIs that allow anyone to supplement their existing applications. However, the openness of these APIs and, in some instances, misconfiguration can make them vulnerable to exposing critical model details.

In this paper, the researchers manage to perform attacks that reveal enough details about the models to replicate the model's prediction quality from the API attack. The following prediction API providers were found to be vulnerable:

BigML
Amazon Machine Learning

领英推荐

Mastering data and AI: turning science fiction into…

Canonical 5 个月前

Future Beat: Password problems

The National News 5 个月前

?? The Future of Designing AI Agents

Pascal Biese 4 个月前

The following model types are susceptible to model extraction attacks:

Decision trees
Logistic regressions
Support Vector Machines
Deep neural networks

What are the risks of this trend? Can it be mitigated?

Cybercriminals that reverse engineer critical algorithms, such as those used for payments or law enforcement, can pose a big threat to global commerce.

The researchers could reverse-engineer model characteristics and sensitive aspects of the training data.

It's important to secure prediction APIs, and some recommendations include the following:

Verifying configuration for which the API exposes fields
Ensure that parameters or data are not exposed if queries are incomplete

You can find additional details on the research paper below:

https://www.usenix.org/system/files/conference/usenixsecurity16/sec16_paper_tramer.pdf

要查看或添加评论，请登录

Dan F.的更多文章

PyTorch 2024 and General AI Trends

2024年9月27日

PyTorch 2024 and General AI Trends

I'm finally getting a chance to catch up with my notes and thoughts from PyTorch 2024. I wanted to highlight some…

4 条评论
Unraveling the Implications of the United States Defend Forward Cyber?Strategy

2023年7月28日

Unraveling the Implications of the United States Defend Forward Cyber?Strategy

The Defend Forward Cyber Strategy represents a proactive approach by the United States to safeguard its digital…
The Product Manager's Guide to Data Privacy

2023年7月10日

The Product Manager's Guide to Data Privacy

As a product manager, one of your top priorities is ensuring that your product is successful and compliant with…

3 条评论
Balancing the Unique Challenges of Data Privacy

2023年4月10日

Balancing the Unique Challenges of Data Privacy

Among all the geopolitical tensions worldwide over the last few weeks, Privacy is a recurring theme in the world of…
The Dawn of Personalized Apps: Code Assistants Empower Non-Technical Users and Transform Product Management

2023年3月16日

The Dawn of Personalized Apps: Code Assistants Empower Non-Technical Users and Transform Product Management

As an experienced product manager, I've witnessed countless evolutions in the tech industry. Yet, one of the most…
Product Management & Cybersecurity: The role of a PM

2023年1月6日

Product Management & Cybersecurity: The role of a PM

As a product manager in the cybersecurity industry, I am constantly amazed by the complexity and dynamism of this…
You have been Hacked, now what: An Incident Response Overview

2022年1月14日

You have been Hacked, now what: An Incident Response Overview

A lot of time is devoted by organizations establishing controls, guidelines, and processes to prevent an incident…
The Internet’s Next Identity Crisis: Pseudo anonymous Email

2021年9月28日

The Internet’s Next Identity Crisis: Pseudo anonymous Email

A user’s primary identity is becoming less effective than before Privacy Implications If it feels like email could use…
2020: Remote Work + Writing

2020年11月17日

2020: Remote Work + Writing

It has been a while since I have written a series on here or Medium, so as this unique year comes to an end, I'll try…

1 条评论
Dan's Data Notes -? Topological Data Analysis Intro

2020年6月22日

Dan's Data Notes -? Topological Data Analysis Intro

A great way to simplify data analytics is by describing as it finding patterns or shapes in the data. An analytics…

See all articles

Dan's Data Notes: Can a ML Model be stolen?

Dan F.

Product at Chainguard

领英推荐

Dan F.的更多文章

社区洞察

其他会员也浏览了

Causal Inference With Missing Data: Missingness Graphs, Recoverability and Testability

The next paradigm shift in AI

Should We Apply the Brakes On Artificial Intelligence Research?

AI, Artificial General Intelligence, and Intuition

Unveiling the Future: Big Data, Artificial Intelligence, and the Human Touch

OpenAI O1: Why Reasoning is the New Oil in AI Development

Powering Data Centers: The Hidden Costs of AI

Top Modern Security and Technology Terms & Concepts

Causality and Inference for Machine Learning

Edge AI & Vector Databases

领英推荐

Dan F.的更多文章

PyTorch 2024 and General AI Trends

Unraveling the Implications of the United States Defend Forward Cyber?Strategy

The Product Manager's Guide to Data Privacy

Balancing the Unique Challenges of Data Privacy

The Dawn of Personalized Apps: Code Assistants Empower Non-Technical Users and Transform Product Management

Product Management & Cybersecurity: The role of a PM

You have been Hacked, now what: An Incident Response Overview

The Internet’s Next Identity Crisis: Pseudo anonymous Email

2020: Remote Work + Writing

Dan's Data Notes -? Topological Data Analysis Intro

社区洞察

其他会员也浏览了

Causal Inference With Missing Data: Missingness Graphs, Recoverability and Testability

The next paradigm shift in AI

Should We Apply the Brakes On Artificial Intelligence Research?

AI, Artificial General Intelligence, and Intuition

Unveiling the Future: Big Data, Artificial Intelligence, and the Human Touch

OpenAI O1: Why Reasoning is the New Oil in AI Development

Powering Data Centers: The Hidden Costs of AI

Top Modern Security and Technology Terms & Concepts

Causality and Inference for Machine Learning

Edge AI & Vector Databases