登录查看更多内容

Exploiting Azure AI DocIntel for ID spoofing

Christophe Parisel

Senior Cloud security architect at Société Générale

发布日期: 2024年12月16日

Sensitive transactions execution often requires to show proofs of ID and proofs of ownership: this requirements is regulated in many industries, and it is part of a broader process called Know Your Customer (KYC).

Agents (customs officers, bank clerks, accountants, ...) must painstakingly type-in the identification fields into their back-end systems, causing both processing delays and significant human workload.

Meet Azure AI Document Intelligence

To expedite this task, and to save their customers' money, Microsoft proposes an Azure AI service called Document Intelligence ("DocIntel"), where said customers can rely on a range of pre-existing data extraction templates (invoices, passports, tax forms, ...) or create their own ones.

When a template is selected, images of legal/commercial/formal documents are sent by the customer to the DocIntel API endpoint and analyzed with computer vision.

The relevant fields (which depend on the model) are extracted and assigned a confidence score by the service. They are returned to the customer in JSON format.

To ease the template-modeling experience, a Studio is provided as part of DocIntel. Once an image is uploaded in Azure Portal, the left-hand pane displays the image itself, and the right-hand pane displays extracted data and confidence scores:

Testing Malaysian passport data extraction using the Azure AI Doc Intel Studio.

Staging a WYSIWYG attack against Doc Intelligence

Suppose we can craft a special image, so that we break the all-important "What You See Is What You Get" (WYSIWIG) paradigm?

What You See: the image of a legal document featuring a set of identification and validity data.
What You Get: a JSON containing another set of identification and validity data, with a very high confidence score.

If we could stage such an attack, here are a few real-world implications this could entail:

1. Modifying the destination country of a money transfer, from "Spain" to a blacklisted country like "North Corea";

2. Swaping the origins of a cattle shipment, from "Venezuela" to "Poland";

3. Abusing an expired country residence document, by modifying its expiration date;

4. Forging an identify to obtain a SIM card for performing SMS phishing campaigns

5. ...

We could perform these many variants of ID spoofing without even using the studio: as long as our KYC application is enslaved to the Azure DocIntel API, the trick could work.

How could we achieve that?

Looking at the documentation, the DocIntel API supports various image formats like JPEG or PNG. Their intended use is to upload a static scan of a document, but formats like PNG have an option to embed multiple frames. These are called Animated PNG (APNG).

Imagine one embeds two slightly different versions of a drivers license into the same APNG: one for Chris Smith, and another one for Bob Smart. Only the name change. How will the DoctIntel API react?

Chris' edited license, now showing Bob Small

If the endpoint is hardened, it should block attempts to use APNG instead of PNG, because there is no reason to upload anything else than static scan of images. Multiple frames are a source of confusion and should return a HTTP client error (4**), or an HTTP server error (5**). But we shouldn't expect the typical HTTP 200/202 response codes indicating that the data was processed.

In Procreate (or another image making tool), let's build an APNG made of 2 frames: Chris and Bob's licenses. We set the loop duration to a value as low as possible.

To optimize the rendering, we can use an APNG assembler tool that will remove the loop and that will make one of the frames invisible. Let's call one of the license frame00.png, the other license frame01.png, and use apngasm to forge an APNG called spoofed.png that meet these criteria:

领英推荐

How can machine learning be used to detect fraud?

Machine Learning 2 年前

The Role of AI in Data Privacy and Security in 2023

TechUnity, Inc. 9 个月前

????? FILED Headlines: Trump pivots away from AI…

RecordPoint 1 个月前

The -l1 option sets the loop count to 1, and the -f makes one of the frames invisible.

Now our APNG looks exactly like a normal PNG, but it is not!

Proof-Of-Concept

Let's put our experience to the ultimate test. We submit spoofed.png to the API endpoint, and...

1. Bingo! We get an HTTP 202 response. DocIntel supports animated PNGs :-)

2. We can control which identity we want the API to process, by setting the proper frame in the APNG.

3. Azure's extraction confidence of the faked identity is extremely high

We can get a visual confirmation of this exploit in the Studio:

Getting Chris' identifiers when we set Bob as the visible frame!

Bob's ID shows up on the left, but Chris' ID shows up on the right... With 97% + confidence.

Outcome

Interfacing a KYC application with the Azure DocIntel APIs opens up an opportunity for hackers to inject spoofed IDs into customers backends.

This proof-of-concept highlights an emergent risk with multimedia AIs: What You See Is NOT What You Get.

The reason is that computer vision processes files, whereas human vision processes retina imprints.

If you are using an OCR as part of your KYC process, I am urging you to perform an image format validation step to block animated images.

Responsible disclosure timeline

08/Nov/2024: The vulnerability was successfully exploited and Microsoft Security Response Center (MSRC) was notified

15/Nov/2024: Issue was resubmitted to MSRC with full details

09/Dec/2024: After reviewing the finding, MSRC decided to close the case because the "reported behavior is not a vulnerability as the correct values are displayed accurately".

10/Dec/2024: Disclosure report (this document) was shared with MSRC

16/Dec/2024: Public disclosure

Kévin KISOKA

Cybersecurity Architect | Ex-Microsoft IR

2 个月

Sometimes I understand why they decline a submission sometimes not… some clarity would help full time researchers or seasonal to be more effective in their finding & submission. Sometime I have the feeling it’s obscurentism and they are gate keeping , playing on semantic , to fix the issue for free and not dropping the bounty ?? anyway thanks for these research well documented and with transparency Christophe

Tyson Garrett

2 个月

Christophe Parisel, as you said to me earlier this year about a different vulnerability, the bigger issue here is not the vulnerability but Microsoft's interpretation of it as not being a vulnerability. As these types of things aren't vulnerabilities, is it possible for Microsoft to publish all the internal and external determinations for similar "line calls"?

2 次回应

David O.

Multi-Cloud Security | CloudSec Author | Azure MVP | AWS Community Builder

2 个月

Nice!

1 次回应

Mauricio Ortiz, CISA

Great dad | Inspired Risk Management and Security | Cybersecurity | AI Governance | Data Science & Analytics My posts and comments are my personal views and perspectives but not those of my employer

2 个月

Christophe Parisel, these findings make reality what we watch in MI movies. The risks are real and scarier every time we find more useful or productive capabilities. I hope that before enabling these capabilities for broader adoption, the software vendors will spend enough time thinking about “what could possibly go wrong” and then implement enough security guardrails. No system will ever be 100% secure, but some areas are critical to protect and secure so they do not become an easy point of entry for criminals.

1 次回应

Marjan Sterjev

2 个月

Is it possible that they are not seeing this vulnerability as a threat?

1 次回应

查看更多评论

要查看或添加评论，请登录

Christophe Parisel的更多文章

How will Microsoft Majorana quantum chip ??compute??, exactly?

2025年2月27日

How will Microsoft Majorana quantum chip ??compute??, exactly?

During the 2020 COVID lockdown, I investigated braid theory in the hope it would help me on some research I was…

14 条评论
Zero-shot attack against multimodal AI (Part 2)

2025年2月3日

Zero-shot attack against multimodal AI (Part 2)

In part 1, I showcased how AI applications could be affected by a new kind of AI-driven attack: Mystic Square. In the…

6 条评论
Zero-shot attack against multimodal AI (Part 1)

2025年1月20日

Zero-shot attack against multimodal AI (Part 1)

The arrow is on fire, ready to strike its target from two miles away..

11 条评论
2015-2025: a decade of preventive Cloud security!

2025年1月6日

2015-2025: a decade of preventive Cloud security!

Since its birth in 2015, preventive Cloud security has proven a formidable achievement. By raising the security bar of…

11 条评论
How I trained an AI model for nefarious purposes!

2024年12月9日

How I trained an AI model for nefarious purposes!

The previous episode prepared ground for today’s task: we walked through the foundations of AI curiosity. As we've…

19 条评论
AI curiosity

2024年11月26日

AI curiosity

The incuriosity of genAI is an understatement. When chatGPT became popular in early 2023, it was even more striking…

3 条评论
The nested cloud

2024年11月13日

The nested cloud

Now is the perfect time to approach Cloud security through the interplay between data planes and control planes—a…

8 条评论
Overcoming the security challenge of Text-To-Action

2024年10月15日

Overcoming the security challenge of Text-To-Action

LLM's Text-To-Action (T2A) is one of the most anticipated features of 2025: it is expected to unleash a new cycle of…

19 条评论
Cloud drift management for Cyber

2024年9月23日

Cloud drift management for Cyber

Optimize your drift management strategy by tracking the Human-to-Scenario (H/S) ratio: the number of dedicated human…

12 条评论
From Art to Craft: A Practical Approach to Setting EPSS Thresholds

2024年9月2日

From Art to Craft: A Practical Approach to Setting EPSS Thresholds

Are you using an EPSS threshold to steer your patch management strategy? Exec sum / teaser EPSS is an excellent exposer…

13 条评论

See all articles

Exploiting Azure AI DocIntel for ID spoofing

Christophe Parisel

Senior Cloud security architect at Société Générale

Meet Azure AI Document Intelligence

Staging a WYSIWYG attack against Doc Intelligence

领英推荐

Proof-Of-Concept

Outcome

Responsible disclosure timeline

Christophe Parisel的更多文章

社区洞察

其他会员也浏览了

'Malicious' cyber campaigns against UK, World's first major act to regulate AI, One Login trebles number of services

Meta fined $100M over EU data security lapse | BYD recalling 97,000 top-selling EVs | AFP hit by cyberattack

Mastering Deepfake Detection: Strategies for Identifying the Latest Digital Deception

Data Privacy Day: Navigating the Future with Best Practices, AI Disruption, and Smart Solutions

Understanding AI Attacks and Their Types

iNews Vol - 146

Detection solutions prevent the spread of harmful deepfakes

Weaponized AI: Adversaries at the Cyber Frontier

Artificial Intelligence, Real World Dangers

Preventing AI Hallucinations, How Multi-Turn Attacks Generate Harmful Content, Guide for Building Secure AI Apps, and more

Meet Azure AI Document Intelligence

Staging a WYSIWYG attack against Doc Intelligence

领英推荐

Proof-Of-Concept

Outcome

Responsible disclosure timeline

Christophe Parisel的更多文章

How will Microsoft Majorana quantum chip ??compute??, exactly?

Zero-shot attack against multimodal AI (Part 2)

Zero-shot attack against multimodal AI (Part 1)

2015-2025: a decade of preventive Cloud security!

How I trained an AI model for nefarious purposes!

AI curiosity

The nested cloud

Overcoming the security challenge of Text-To-Action

Cloud drift management for Cyber

From Art to Craft: A Practical Approach to Setting EPSS Thresholds

社区洞察

其他会员也浏览了

'Malicious' cyber campaigns against UK, World's first major act to regulate AI, One Login trebles number of services

Meta fined $100M over EU data security lapse | BYD recalling 97,000 top-selling EVs | AFP hit by cyberattack

Mastering Deepfake Detection: Strategies for Identifying the Latest Digital Deception

Data Privacy Day: Navigating the Future with Best Practices, AI Disruption, and Smart Solutions

Understanding AI Attacks and Their Types

iNews Vol - 146

Detection solutions prevent the spread of harmful deepfakes

Weaponized AI: Adversaries at the Cyber Frontier

Artificial Intelligence, Real World Dangers

Preventing AI Hallucinations, How Multi-Turn Attacks Generate Harmful Content, Guide for Building Secure AI Apps, and more