"lol, distributed server networks? You can scan the entire internet with a single VPS!"

"lol, distributed server networks? You can scan the entire internet with a single VPS!"

Welcome to today's education lesson on some common pitfalls when it comes to internet scanning, and why counting IP addresses can often be misleading.

Our bad take of the day is: "why scan the internet with a distributed server network when you can use a dual core VPS, clearly you don't know how to write scanners!"

It's true. Nowadays, you can realistically scan the entire IPv4 space with a cheap VPS (though, unlikely more than a couple of destination ports). It's easily possible to obtain a virtual server with a 1gbps, and even 10gbps network connection. So, why, would someone like me, use multiple servers if one is all you need? A couple of reasons.

Introducing: The IP Blocklist

Scanning the entire internet is widely considered to be malicious activity, regardless of your good intentions. Launching an internet scan WILL land you on a lot of IP block lists. You'll end up in people's iptables rules, you'll end up in firewall company's malicious actor blocklists, you might even end up getting blocked by your own ISP (trust me, nothing is more awkward than getting cut off by your own service provider because they though you were doing crime).

From the second you send that first packet, it's basically a race to complete your internet scan before your IP address starts making its way into shared block lists. Your biggest issue is going to be stuff like DNSBLs (DNS Blocklists). These are DNS based IP blocklists which allow firewall products to automatically look up the IP address of an incoming packet and decide whether to block it or not. Since these blocklists are shared across many different networks, a scan of one system could result in you being blocked on millions of endpoints.

The easiest and cheapest way to avoid this is to simply outrun the blocklist. The more servers you use and the more bandwidth you have, the more likely you are to complete a full internet scan before automated blocklists even get a chance to realize you're a scanner. More servers also means more IP addresses, and the more IPs you use, the less affected you are by getting it blocked here and there. While you can simply just bind multiple IPs to a single server, this slows down your scans, and also comes with additional complexities like most ISPs limiting the number of IPs you can assign to a VPS to as little as 2.


Getting Rolled In The DHCP Churn

Well first, what even is DHCP Churn?

It's no secret that IPv4 addresses are in short supply, but your biggest problem isn't obtaining one. In response to the ever shrinking list of available IPv4 addresses, many ISPs have stopped assigning static IP addresses to customers. This is especially common with consumer ISPs, but even small-to-medium business ISPs do it too.

When a customer connect to the internet, their ISP will typically use something like DHCP to grab a free IP address from their IP pool, then assign it to the customer temporarily. The more constrained an ISP is with regards to available IPs, the shorter the DHCP lease times, and the more aggressive they get when it comes to rapidly re-assigning dynamic IP addresses.

A dynamic IP address may be bound to a customer's MAC address, their current internet session, or just vibes. The downstream affect of this is dynamic IP addresses change... a lot. This problem is referred to as "DHCP Churn". One study on the effects of DHCP churn concluded that the IP address of the average internet user in Brazil changes 1.3 times per day. Sounds like a lot right? Well, some internet user's IP addresses can change upwards of 8 times over the course of a single day.

So, how does that impact scanning? Well, it has two separate impacts that are ironically diametrically opposed.

The implications of DHCP Churn on scanning and sinkholing

Something as simple as a user rebooting their system or briefly losing connectivity can result in their IP address changing. Because ISPs tend to occupy lots of smaller address spaces rather than a single large ASN, this could result in a system disappearing from one IP, only to then pop back up in an entirely different ASN.

The longer your scan takes, the more systems that get a chance to hop between IP addresses mid-scan, causing you to not only miss systems, but also record the same systems multiple times due to the inability to distinguish between unique endpoints and unique IP addresses. The former problem isn't worth spending much time on, because no matter how good your scanning is, you're going to miss systems. Be it NAT, firewalls, random packet loss, you're going to miss stuff. But the other side, is a huge issue.

Callback scans, sinkholes, and result inflation

DHCP Churn is especially problematic when doing repetitive scans and with situations which result in callbacks. Let's take two callback scans for example: Log4j and the current CUPS vulnerability. In both cases, scanning for these vulnerabilities will result in a vulnerable system initiating an outbound HTTP connection to an IP address of your choice. This is actually great news for us! Scanning for open ports and services can tell us a lot about how many systems are potentially vulnerable, but triggering a system to initiate an outbound connection to a server we control is almost a 100% confirmation that it is actually vulnerable, but...

There is a problem though: IP address inflation. It's the same problem that I've faced many times in my decade of sinkholing botnets. For those that don't known, botnet sinkholing is the act of taking over a threat actor's infrastructure and redirecting the infected systems to a server which we control. It can be used to protect infected systems from being use maliciously in cases where they cannot be disinfected, and it can also allow us to gather valuable metrics about the scale of the impact.

When you have a situation in which a system is issuing recurring connections to a server you control, be it due to botnet sinkholing, or scanning for Log4j & CUPS, you get IP address inflation. If we spend a day scanning, and a system changed IP address 8 times in a day, that is single system connecting to us from 8 different IP addresses. If the system doesn't expose any globally unique identifier we can use to track it, and there isn't a way we can assign one, which it often the case, our single system now gets recorded as 8 separate systems. Now imagine that over 2 days, 7 days, 3 months. Quickly, a small number of affected systems can start to look like Armageddon if you're only able to count unique IP addresses.

In my experience, on average, with each extra 24-hour period, 20% of the IPs you log will have been from already recorded systems coming back on new IP addresses. So your results are being artificially inflated by 20% per day. This is a problem if you're looking for accurate data, and is also often used intentionally to mislead people on the severity of a threat. A botnet of 100,000 computers can easily result in recording 10s of millions of unique IP addresses over a several month period. If you're looking for glory, why say "we disrupted a botnet consisting of 100,00 systems" when you can say "we disrupted a botnet consisting of up to 10 million potential infections". It's not wrong, we don't know for sure that every of those 10 million IPs wasn't a static IP address for the entire 6 month period, but it's also not right either.

When it comes to scanning, we avoid this by scanning fast and scanning often. We can't just do one scan that takes a day, because this allows ample time for IP address inflation, and also time zones mean not every system is online when we scan it. The easiest remedy is to scan with enough systems to get our scan time down to only a couple of minutes, then repeat the scan (say every 30 minutes or so), through a 24-hour period.

Remember, we also need to avoid systems that were scanned in earlier scans repeatedly connecting to our server under different IP addresses. So, ideally, we scan the entire internet in a minute, record incoming HTTP requests for just long enough for every system to connect (maybe 5 minutes or so), then tear down all the systems and start again. This isn't something that's practical to do from a single VPS, as the scan will be too slow, the network will get saturated, and we'll likely miss a lot of data. It's another common case of just because you can doesn't mean you should, and what's the point when you already have the systems in places because you've built a decade long career building threat tracking systems.

Conclusion

So, there you have it. If you made it this far, you now have a better understanding of the answer to the question "why use many computer when few computer will do". Whilst on the surface internet scanning sounds pretty easy and like something that any amateur researcher can do, there are a lot of hidden complexities that require time and experience to understand and without those, you will only find wildly inaccurate results. I might be able to swing a tennis racket, but I'm no Serena Williams.

Chris Petersen

Do-er of the Difficult, Wizard of Why Not, and Certified IT Curmudgeon

5 个月

?? and all of those issues exist at smaller scale when trying to implement internal scanners to keep a CMDB up to date, exacerbated by load balancers and ephemeral components (a la cloud) that might only live a few seconds or minutes by design.

回复
Larisa M.

I build consensus between cybersecurity engineers, lawyers, and executives.

5 个月

I love that you mentioned the legality of such activity. One of the first tasks when we get a new pentester is to make sure they don't do something stupid (considered criminal activity) or put a contract in jeopardy because they start to scan everybody and they'momma...

johnbel mahautiere

Versatile Engineer Seeking Opportunities in Software Development and DevOps

5 个月

Help me understand the internet situation

回复
Stéphan D.

Consultant Cybersécurité - ISO 27001 | IEC 62243-4 | EBIOS RM | Stormshield CSNA | Okta certified professional

5 个月

I have heard of datascientists scrapping online information, they can't do it fast so they rely on other solutions. It seems it would be good to fingerprint the probed systems or the positiv replies, when it's possible. IPv6 single stack is unseen, I wonder if there is a sustantifical number of targets that are off the radar.

回复

Just a thought to try to get more accurate results. You could group your scans by ISPs to get a snapshot of how many systems are impacted on that ISP. Since a client will only churn their IP within the scope of IPs that ISP owns. Do this several times throughout the day and take the mean number of results to say that ISP X likely has Y clients impacted by Z vulnerability.

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了