Expanding on UUIDv1 Security Issues
Every week, almost without fail, I come across one thing that confuses, entertains, or most commonly infuriates me. I’ve decided to keep a record of my adventures.
Daniel Thatcher published an article earlier this week about abusing UUIDv1 tokens leveraged in security applications. The takeaway of that blog is the same takeaway as this post: Only use UUIDv4 for security applications. We're gonna take a more in depth look at UUIDv1 (and by extension UUIDv2) tokens both this week and next week.
The Background
Admittedly, it took me a detailed read through of the post and RFC to really understand that UUIDv1 is an encoding mechanism and not a hashing mechanism per say. As Daniel showed in his tool, which is mostly just leveraging the Python UUID library (why recreate functional code), the breakdown of the UUIDv1 token is as follows:
As you can see, it's completely reversible. While it's 'widely' known that usage of UUIDv1 has privacy ramifications leaking MAC addresses (hell, even stackoverflow warns this), most documentation doesn't highlight its potential lack of randomness inherent in generating UUIDs in a multi-user environment. Really all UUID docs should just have a warning indicating if you're looking for random -- UUIDv4 is where you wanna be.
But is there ever a situation in which UUIDv1 can still, reasonably, be used securely? Okay, reasonable might not be the right wording - I'll call it 'business acceptable'. This term contrasts with 'academic acceptable' where researchers who don't have 1000's of devs to wrangle often bemoan why all hypothetical issues haven't been immediately fixed.
The Problem
In our scenario imagine you found a service using UUIDv1 in a security context, but the developers couldn't prioritize the fix, is there any situation where that wouldn't matter much?
First, we should keep in mind that the main issue described in Daniel's blog was a service that is used by multiple users. This configuration meant that the same server was generating the UUIDv1 token and effectively led to the MAC portion and clock sequence (62 bits) of the token being leaked. This reduced the security window of UUIDv1 to time as a factor, which, as we saw, could be an issue.
I'd like to highlight three alternative scenarios and investigate them:
Would such implementations be 'secure'?
The Solution
In the first two situations, the question is going to come down to guessing a MAC address. MAC addresses are not exactly random and have a fixed population but as most of you know MAC addresses are composed of 'only' 6 couplets of hex digits. The most na?ve translation of this is 16^12 possible combinations (just over 281 trillion combinations). On its face, this would make an exhaustive brute force of the MAC search space, even if the exact timing and clock sequence were known, pretty unreasonable.
领英推荐
This however doesn't account for OUIs. The first three octets of MAC addresses are reserved for identifying the manufacturer of the network interface (Author's Note: In global MACs primarily, more on this next week). Looking at the OUI listing maintained by the IEEE, one would be immediately struck by the variety of manufacturers that are likely NOT making client hardware. So, figuring out roughly what type of hardware your target uses can greatly reduce your search space. For instance, if you know your target uses an Apple device, rather than 281 trillion MACs you're left with just over 16 million possible options.
Scenario 1
So, in scenario 1, an attacker would likely have to send ~16 million requests for each possible timestamp (plus some extra for sequence). The blog points out one of the most important pieces of information about this timestamp and that is that, in spite of being able to go down to the 100 nanosecond intervals (10 million per second) "It's uncommon for systems to use this level of precision when generating GUIDs" instead using millisecond precision, this means that every second has 1000 possible timestamp values (Author's Note: I wasn't actually able, on cursory inspection, to find any implementations outside of node that did this, certainly not Python - hence scenario 3) - For one second this increases the search space from 16 million to 16 billion, but would on average finish after 8 billion requests.
The main issue with scenario 1 is that a central service, even if its on the same network, is validating this request and therefore has network and processing latency, assuming even 160 requests per second it'd still take 2 years on average to get a collision. Therefore, most network attacks similar to Scenario 1 are unrealistic.
Scenario 2
But what about scenario 2, In this scenario the attacker has a hashed UUIDv1 and once they collide with the hash, they will know the server's MAC address. By comparison this has some distinct advantages, timing is somewhat known and, more importantly, the attack can be done offline. Additionally, once completed we have the same information as the initial blog and the attack is roughly the same (just very slightly slower due to the added hashing operation).
Server hardware can, sometimes, be easy to guess. There is only a limited subset of networks cards typically used within data center products. Likewise, physical hardware is often standardized within a company. This means that any UUIDv1 from any organization is likely (maybe) to reflect the same MAC OUI as the rest of their servers. Additionally, orchestrators and VMs in use can often allow for a restricted search space. For instance, Docker seems to always assign the same MACs (Roughly 65K):
The MAC address is generated using the IP address allocated to the container to avoid ARP collisions, using a range from 02:42:ac:11:00:00 to 02:42:ac:11:ff:ff
Likewise, VMWare uses '00:50:56' as the prefix for VSphere hosts, VirtualBox hosts use '08:00:27', etc. Unlike the aforementioned hosts EC2 and GCP compute instances seem to have random MAC addresses. Of course, these MACs are not truly random and are likely generated based on network/IP address of the host, but ˉ\_(ツ)_/ˉ.
Given that the MAC range used can sometimes be reasonably be guessed, scenario 2 then comes down to guessing, at worst, all the MACs for that provider (16 million) and if we did that for one second (1000 milliseconds) we'd have generated roughly 16 billion hashes. Even though this is similar complexity to scenario 1, scenario 2 is offline. Turns out modern GPUs can do that in a second or two. This also only has to ever be done once per server so the payoff is pretty good. As a result, this attack is COMPLETELY FEASIBLE.
I wanted to test how feasible this attack was given an has correctly guessed the first 6 bytes of the MAC address (Author's Note: This took me on an adventure of how the MAC address is determined that we'll talk about in next week's blog). The resolution of this effort took around 10 seconds to generate a valid UUID for all sequences and MACs on my Apple M1. Doing this for a 3 seconds interval takes 20,000,000 iterations which would take about 9 years on my computer. Of course this is without parallelization, while using inefficient python, and not using any GPU enhancement.
Scenario 3
Earlier I made an additional statement that bears repeating. While the initial author sorta glossed over this fact, if a uuidv1 implementation DOES follow the nanosecond/100 implementation that the RFC requests, turns out that a proper UUIDv1 implementation is actually rather safe against casual attacks. Even considering the weakness described in the initial blog would require 10 million requests (5 mill on average) per second interval with a proper implementation. This gets even worse in our hypotehtical scenarios. For scenario 1, a one second interval search of all MAC addresses would take on the order of 160 trillion requests. Even in the second scenario where the search is done offline with a known OUI an attacker would need to generate 16 trillion hashes to collide this comes out to about 6 hours of computation for every collision, and that's just to get the server's MAC and sequence.
In summary, services still should really not be relying on UUIDv1 to be random, use UUIDv4. We have shown that in some networked environments when UUIDv1s are generated by clients rather than a single server, weaknesses exists but are not feasible to exploit. However, we have been able to expand the scope of the attack to cover even when the MAC address of a server isn't directly exposed (for instance by hashing a UUIDv1), we have shown that it is well within an attacker's capability to abuse this use case.