Security untruthfulness and accountability
[In]Security has been top of mind recently, with record randoms payments, and non-canary deployments
We can pretend insecurity is a "technology" issue ( and maybe Rust will save us ;) ), however to really make progress, as an industry we address truthfulness and accountability
Additionally, customers need to be more involved in conversations about risk and timeliness of fixes with respect to impact on the customer's environments.
Introduction
In the following example, I detail a really nasty network security bug, and how the vendor has not been truthful about the impact or root cause. Customers were at great risk, and were not told by the vendor about the issue leaving the customer exposed.
My fear is that this behavior is representative of the widespread lack of security related accountability in the tech industry. It seems we do not have well established norms around transparency and accountability, which is unlikely to be in our collective best interests. Vendors should be encouraged to have more transparency and not be scared to announce issues.
Ultimately, the tech industry has a terrible security situation and we are in and start being more honest and accountable. We are unlikely to make significant progress until we do.
Agenda
Background - IPv6 issue
A few months ago I tried to configure IPv6 on my home network. It was a disaster!
Thankfully my poor wife showed a lot of patience despite conference calls dropping out as I struggled to resolve the issue.
After setting up a new IPv4 only SSID+VLAN to allow her to work without impact, I set about trying to work out what was wrong.
Steps
1 - Initial investigation
The challenge was that sometimes IPv6 would work fine for a few hours, even overnight sometimes, but then it would fail. Restarting wifi on a device would quickly fix the issue. However, I didn't know how to reliably reproduce the issue.
( It was also pretty amusing to see the way various apps handled the IPv6 failures differently. Let's just suggest that the Google apps are not consistent. <cough> Nest <cough> )
Initially, I thought the issue must be a misconfiguration somehow. After rereading SLAAC related RFCs, and very carefully reviewing the router's active state, everything seemed ok.
I also reached out to a bunch of my friends for any advice, and was dismayed to learn that most of them don't run IPv6, so that was largely a dead end. If people working in tech aren't bothered with IPv6, it's little wonder such little progress has been made in the last ~20 years.
2 - Vendor assistance
Reaching out to the vendor, the tech support team confirmed the configuration looked reasonable to them.
Tech support did, however, help me to understand that somehow my laptop was getting multiple IPv6 addresses from different subnets on the same interface. ( Normally you get multiple IPv6 addresses from the same subnet. ) ??! But why? This was the clue I needed.
Most of my experience with IPv6 is within router infrastructure, where you have fixed IPv6 addresses. IPv6 via SLAAC on end devices was not something I'd spent a lot of time looking at before.
3 - Packet captures identify the issue
After spending most of one Saturday packet capturing (tcpdump-ing) all over the network, to my amazement and horror, it was clear that traffic from multiple VLANs was all being mixed together. This was only happening on the wifi network.
The result was that on the wifi network, the IPv6 neighbor discovery messages for multiple subnets were all being flooded into the same layer 2 domain. This completely screws up IPv6, but IPv4 actually handles this, because the IPv4 end points simply ignore the traffic for subnets they don't know about. ( Of course there will be a race for DHCP allocation, so results are unlikely to be deterministic. )
Off topic - but to be honest, this is probably an IPv6 design issue. While you can argue traffic for multiple IPv6 subnets shouldn't exist on the same layer 2 domain, the legacy IPv4 can operate without this constraint, and so we arguably went backwards from IPv4 to IPv6. IPv6 is dramatically more fragile. ( There are actually tools for monitoring neighbor discoveries to try to catch these types of issues.
https://github.com/fln/addrwatch )
4 - Root cause and fix!
Armed with the undeniable evidence from the packet captures, the vendor was able to replicate the issue, and was quick to identify the root cause. It was a bug in their software!
( NOT a configuration issue. )
Within a few days, they provided some beta code with the fix! Yay!
It then took them some months to actually carefully deploy the fix. The cautious approach was important because I'm sure a bunch of people out there were impacted when the code was upgraded. e.g. There were networks working only because the VLANs were not being effective, so correcting network segmentation was definitely going to impact some networks.
Overall, the tech support team was excellent throughout, and I can't thank them enough!
5 - Response and Disclosure = Not good
Once the major issue was understood, and a fix was available, how the vendor handled the situation wasn't great.
The bug was understood on the 8th of May 2024, with the software fix was released on the 24 Jun 2024 = 48 days, or ~6.8 weeks later. This means customers were exposed for nearly seven (~7) weeks.
This article is primarily about how the vendor could do dramatically better with respect to truthfulness and accountability. It is this type of behavior that means that as an industry we are not addressing fundamental process and testing issues.
What is network segmentation?
This section provides some background on the issue.
This issue was a major bug in the network equipment, where network segmentation was completely broken. It rendered VLANs useless ( traffic was "jumping" VLANs ). From a security perspective this is extremely bad.
For example, this bug meant that guest traffic which should have been completely isolated from perhaps medical/financial traffic was NOT protected.
Anyone on the guest network could immediately access the medical/financial network.
Malware on machines could now flow freely across all network segments.
This is a complete isolation failure with no user intervention or "attack" required.
The following diagram depicts how an attacker on one network could directly access other network segments, bypassing firewalls, or other security features.
领英推荐
The ISC2 CISSP exam describes physical, logical, and micro segmentation within the secure network design principles:
All organizations rely on these fundamental segmentation features. Many report their organizations security compliance, including as it relates to standards like ISO27001, based on the understanding that these features are in place.
If/when the vendor knows a fundamental feature is broken, how should the vendor handle this? Should they immediately inform the customers of the heightened risk? I would suggest that they should.
The vendor is not the one who is unknowingly:
In this case, the vendor, without consulting the customers, made the "risk assessment" and decided NOT to inform the customers of the increased risks. This isn't great for customers. This is particularly alarming, because many customers buy equipment from this vendor because of the product positioning as a "security product". An example of this is that the vendor doesn't sell "firewalls", but instead sells "security appliances", which is trying to emphasize how they are "secure".
Customers were NOT given the chance to decide how the lack of segmentation impacted them. Some of the customers may have decided the risk was too high, and they could have taken steps to reduce the exposure.
Sadly, the vendor did not give the customers the opportunity to make informed decisions.
It begs the question: What else aren't they telling us about?
Misleading Disclosure
The broken software has now been fixed for some time.
Reading the vendor release notes, however, you could be forgiven for not realizing how seriously broken the software was. Equally, you could be forgiven for not realizing how the vendor clearly failed to perform basic security validation tests ( and/or get a 3rd party security vendor to do so ).
The security advisory bulletin reads:
"A misconfiguration on UniFi U6+ Access Point could cause an incorrect VLAN traffic forwarding to APs meshed to UniFi U6+ Access Point."
This makes it sound like customers could accidentally miss configure the equipment, but this is untrue.? The only "misconfiguration" was the vendor's software incorrectly configured the switching chips.? Poor customers couldn't do anything about this.
The software release notes read:
"Fixed incorrect VLAN traffic forwarding to APs meshed to U6+."
This is closer to the truth, although more accurate would be have been "Fixed incorrect VLAN stripping for APs meshed to U6+".
The CVE contains the same misleading "misconfiguration" claim, which is untrue:
Notes on CVSS risk assessments
It's also worth mentioning that the Common Vulnerability Scoring System (CVSS) ( https://www.first.org/cvss/v3.0/specification-document ) attack vector definition with respect to network segmentation isn't entirely clear and is enabling the vendors to get away with murder.
A vulnerability exploitable with network access means the vulnerable component is bound to the network stack and the attacker's path is through OSI layer 3 (the network layer). Such a vulnerability is often termed "remotely exploitable" and can be thought of as an attack being exploitable one or more network hops away (e.g. across layer 3 boundaries from routers). An example of a network attack is an attacker causing a denial of service (DoS) by sending a specially crafted TCP packet from across the public Internet (e.g. CVE 2004 0230).
Specifically, for this issue, the isolation was broken at the OSI layer 2, and therefore rendered the OSI layer 3 "network" layer completely broken also. The bug allows an "attacker" to reach the network layer of a system that should be unreachable. This bug removed routing, and I suspect that the authors of the CVSS definition may not have considered the pathological cases of VLANs not working at all.
Another analogy for this situation is to think about a hop as a fence
Did the attacker jump/hop over the fence?
No hops required because the fence had fallen over
However, because of wording of the CVSS Attack Vector (AV) scoring, the vendor claims it isn't a "network" attack vector, but is a less serious "adjacent" attack vector. Technically, you could argue this isn't "exploitable one or more network hops away", but that's only because the need for hops was removed! Certainly, the bug allows an OSI layer 3 (network layer) exploit.
Sadly, this is a case of the fox being in charge of the chicken house. The person assessing the security vulnerability is likely the same person, or at least within in the same team, as those who should be ensuring security features work. Clearly they had not taken due care, and so of course they are not going to accept that they had failed in a spectacular way.
I was so confused by attack vector definition that I even reached out to our friends at cisa.dhs.gov posing some hypothetical questions about what happens when VLANs are stripped.? However, I now have doubts about the "Cyber security & Infrastructure Security" agency, because the "Coordinated Vulnerability Disclosure (CVD) Lead" suggested "APs typically do not support VLANS."? Possibly, this was a miss understanding, but it's certainly curious. I've definitely never worked on a corporate wifi network that did not rely on VLANs. Sigh. ??!
Impact of untruthfulness
Impact on the customers
There was without doubt a significant increase in risk for all the customers using VLANs and meshing. How large the impact is difficult to assess.
Can we attribute security breach to the lack of segmentation?
We don't know and we probably will never know. Any customers who had security incidents assumed they had segmentation and so when investigating any breaches, the customers probably didn't consider lack of segmentation as an attack vector.
Attackers who were taking advantage of this "exploit" are unlikely to come forward to let us know they used this vector of attack.
Impact on the vendor
Sadly, the vendor is trying to get away without being truthful to avoid accountability.
By doing so, they are doing a disservice to their customers and their company:
Ultimately, the lack of thorough security testing is probably a funding issue, and so by not being honest they are not getting more funding that they clearly require.
Conclusion
Interesting perspective on the current state of security. What steps do you think the industry can take to prioritize security in the development process?
Principal Solutions Engineer @ Synamedia | Systems & Network Architecture
7 个月Great write up Dave Seddon & thanks for sharing! As a consumer of this vendors products it's certainly concerning to hear that things of this magnitude are omitted from their communications.
Nice write-up of a very scary sounding vulnerability Dave Seddon. Thanks for sharing.