Exploiting Azure AI DocIntel for ID spoofing

Exploiting Azure AI DocIntel for ID spoofing

Sensitive transactions execution often requires to show proofs of ID and proofs of ownership: this requirements is regulated in many industries, and it is part of a broader process called Know Your Customer (KYC).

Agents (customs officers, bank clerks, accountants, ...) must painstakingly type-in the identification fields into their back-end systems, causing both processing delays and significant human workload.

Meet Azure AI Document Intelligence

To expedite this task, and to save their customers' money, Microsoft proposes an Azure AI service called Document Intelligence ("DocIntel"), where said customers can rely on a range of pre-existing data extraction templates (invoices, passports, tax forms, ...) or create their own ones.

When a template is selected, images of legal/commercial/formal documents are sent by the customer to the DocIntel API endpoint and analyzed with computer vision.

The relevant fields (which depend on the model) are extracted and assigned a confidence score by the service. They are returned to the customer in JSON format.

To ease the template-modeling experience, a Studio is provided as part of DocIntel. Once an image is uploaded in Azure Portal, the left-hand pane displays the image itself, and the right-hand pane displays extracted data and confidence scores:


Testing Malaysian passport data extraction using the Azure AI Doc Intel Studio.



Staging a WYSIWYG attack against Doc Intelligence

Suppose we can craft a special image, so that we break the all-important "What You See Is What You Get" (WYSIWIG) paradigm?

  • What You See: the image of a legal document featuring a set of identification and validity data.
  • What You Get: a JSON containing another set of identification and validity data, with a very high confidence score.

If we could stage such an attack, here are a few real-world implications this could entail:

1. Modifying the destination country of a money transfer, from "Spain" to a blacklisted country like "North Corea";

2. Swaping the origins of a cattle shipment, from "Venezuela" to "Poland";

3. Abusing an expired country residence document, by modifying its expiration date;

4. Forging an identify to obtain a SIM card for performing SMS phishing campaigns

5. ...

We could perform these many variants of ID spoofing without even using the studio: as long as our KYC application is enslaved to the Azure DocIntel API, the trick could work.

How could we achieve that?

Looking at the documentation, the DocIntel API supports various image formats like JPEG or PNG. Their intended use is to upload a static scan of a document, but formats like PNG have an option to embed multiple frames. These are called Animated PNG (APNG).

Imagine one embeds two slightly different versions of a drivers license into the same APNG: one for Chris Smith, and another one for Bob Smart. Only the name change. How will the DoctIntel API react?


Chris' driver license


Chris' edited license, now showing Bob Small


If the endpoint is hardened, it should block attempts to use APNG instead of PNG, because there is no reason to upload anything else than static scan of images. Multiple frames are a source of confusion and should return a HTTP client error (4**), or an HTTP server error (5**). But we shouldn't expect the typical HTTP 200/202 response codes indicating that the data was processed.

In Procreate (or another image making tool), let's build an APNG made of 2 frames: Chris and Bob's licenses. We set the loop duration to a value as low as possible.

To optimize the rendering, we can use an APNG assembler tool that will remove the loop and that will make one of the frames invisible. Let's call one of the license frame00.png, the other license frame01.png, and use apngasm to forge an APNG called spoofed.png that meet these criteria:


apngasm, an animated PNG assembler

The -l1 option sets the loop count to 1, and the -f makes one of the frames invisible.

Now our APNG looks exactly like a normal PNG, but it is not!


Proof-Of-Concept

Let's put our experience to the ultimate test. We submit spoofed.png to the API endpoint, and...

1. Bingo! We get an HTTP 202 response. DocIntel supports animated PNGs :-)

2. We can control which identity we want the API to process, by setting the proper frame in the APNG.

3. Azure's extraction confidence of the faked identity is extremely high


The API endpoint accepts animated PNG!

We can get a visual confirmation of this exploit in the Studio:


Getting Chris' identifiers when we set Bob as the visible frame!

Bob's ID shows up on the left, but Chris' ID shows up on the right... With 97% + confidence.


Outcome

Interfacing a KYC application with the Azure DocIntel APIs opens up an opportunity for hackers to inject spoofed IDs into customers backends.

This proof-of-concept highlights an emergent risk with multimedia AIs: What You See Is NOT What You Get.

The reason is that computer vision processes files, whereas human vision processes retina imprints.


If you are using an OCR as part of your KYC process, I am urging you to perform an image format validation step to block animated images.


Responsible disclosure timeline

08/Nov/2024: The vulnerability was successfully exploited and Microsoft Security Response Center (MSRC) was notified

15/Nov/2024: Issue was resubmitted to MSRC with full details

09/Dec/2024: After reviewing the finding, MSRC decided to close the case because the "reported behavior is not a vulnerability as the correct values are displayed accurately".

10/Dec/2024: Disclosure report (this document) was shared with MSRC

16/Dec/2024: Public disclosure


Kévin KISOKA

Cybersecurity Architect | Ex-Microsoft IR

2 个月

Sometimes I understand why they decline a submission sometimes not… some clarity would help full time researchers or seasonal to be more effective in their finding & submission. Sometime I have the feeling it’s obscurentism and they are gate keeping , playing on semantic , to fix the issue for free and not dropping the bounty ?? anyway thanks for these research well documented and with transparency Christophe

回复

Christophe Parisel, as you said to me earlier this year about a different vulnerability, the bigger issue here is not the vulnerability but Microsoft's interpretation of it as not being a vulnerability. As these types of things aren't vulnerabilities, is it possible for Microsoft to publish all the internal and external determinations for similar "line calls"?

David O.

Multi-Cloud Security | CloudSec Author | Azure MVP | AWS Community Builder

2 个月

Nice!

Mauricio Ortiz, CISA

Great dad | Inspired Risk Management and Security | Cybersecurity | AI Governance | Data Science & Analytics My posts and comments are my personal views and perspectives but not those of my employer

2 个月

Christophe Parisel, these findings make reality what we watch in MI movies. The risks are real and scarier every time we find more useful or productive capabilities. I hope that before enabling these capabilities for broader adoption, the software vendors will spend enough time thinking about “what could possibly go wrong” and then implement enough security guardrails. No system will ever be 100% secure, but some areas are critical to protect and secure so they do not become an easy point of entry for criminals.

Marjan Sterjev

IT Engineer | CISSP | CCSP | CEH (Master): research | learn | do | MENTOR

2 个月

Is it possible that they are not seeing this vulnerability as a threat?

要查看或添加评论,请登录

Christophe Parisel的更多文章

  • How will Microsoft Majorana quantum chip ??compute??, exactly?

    How will Microsoft Majorana quantum chip ??compute??, exactly?

    During the 2020 COVID lockdown, I investigated braid theory in the hope it would help me on some research I was…

    14 条评论
  • Zero-shot attack against multimodal AI (Part 2)

    Zero-shot attack against multimodal AI (Part 2)

    In part 1, I showcased how AI applications could be affected by a new kind of AI-driven attack: Mystic Square. In the…

    6 条评论
  • Zero-shot attack against multimodal AI (Part 1)

    Zero-shot attack against multimodal AI (Part 1)

    The arrow is on fire, ready to strike its target from two miles away..

    11 条评论
  • 2015-2025: a decade of preventive Cloud security!

    2015-2025: a decade of preventive Cloud security!

    Since its birth in 2015, preventive Cloud security has proven a formidable achievement. By raising the security bar of…

    11 条评论
  • How I trained an AI model for nefarious purposes!

    How I trained an AI model for nefarious purposes!

    The previous episode prepared ground for today’s task: we walked through the foundations of AI curiosity. As we've…

    19 条评论
  • AI curiosity

    AI curiosity

    The incuriosity of genAI is an understatement. When chatGPT became popular in early 2023, it was even more striking…

    3 条评论
  • The nested cloud

    The nested cloud

    Now is the perfect time to approach Cloud security through the interplay between data planes and control planes—a…

    8 条评论
  • Overcoming the security challenge of Text-To-Action

    Overcoming the security challenge of Text-To-Action

    LLM's Text-To-Action (T2A) is one of the most anticipated features of 2025: it is expected to unleash a new cycle of…

    19 条评论
  • Cloud drift management for Cyber

    Cloud drift management for Cyber

    Optimize your drift management strategy by tracking the Human-to-Scenario (H/S) ratio: the number of dedicated human…

    12 条评论
  • From Art to Craft: A Practical Approach to Setting EPSS Thresholds

    From Art to Craft: A Practical Approach to Setting EPSS Thresholds

    Are you using an EPSS threshold to steer your patch management strategy? Exec sum / teaser EPSS is an excellent exposer…

    13 条评论

社区洞察

其他会员也浏览了