Why it’s Hard to Protect Classified Data - Cloud Networking
Amazon Web Services has raised vehement objections to our Department of Defense awarding a $10 billion cloud computing contract to Microsoft instead of to AWS.
Deploying classified DoD applications and data in the Cloud isn’t nearly as simple as deploying unclassified applications. Even sophisticated financial organizations such as Capital One have suffered embarrassing leaks of valuable information. The Capital One leak “only” cost money; military leaks could cost soldiers’ lives and even a war.
This series of articles explores some of the difficulties any cloud vendor faces in handling a large, diverse, global load of classified DOD information. This article discusses networking to show why it’s hard for any cloud vendor to guarantee the data protection which DoD demands.
Networking
The Arpanet, which grew into the Internet, was designed to make sure that rapid communication would be possible even if a war destroyed a great deal of the American telephone system. Each computer in the network was given a 32-bit Internet Protocol (IP) address expressed as a series of four 3-digit numbers. It soon became obvious that it was easier to remember “google.com” than an IP address like 172.217.3.101, so Arpa added a Domain Name System (DNS) to look up google.com and return the IP address.
When you browse a web page, your browser displays a tiny message like “Resolving host” at the bottom of the page as it looks up the IP address followed by “Transferring data from XX.” These messages may go by too fast to see. You can open cmd.exe on a Windows computer and enter the command “ping google.com.” Your PC sends a request to the local DNS server which sends back Google’s IP address.
Google has data centers all over the world. Their network administrators tell each local DNS server the IP address of the nearest Google data center so that their search load is distributed nicely.
Message Routing
When your browser gets the Google IP address, it sends a request there so that a Google server can return the home page. How does the network know how to find the Google server at that IP address?
If you look a street address up in the yellow pages (remember those?), the address doesn’t tell you where it is. You can’t get there without a map to show you which way to turn at every intersection along the way.
The genius of the Arpanet is that there is no map! The goal was to get data to the destination even if a war disrupted the phone network. This would render a map useless, so there isn’t one.
Instead of knowing where the destination server is located and the addresses of all the relay computers along the way, the first several digits of the IP address give a general idea where the server is, rather like knowing a zip code tells you roughly where a building is. The request is sent to a “nearer” relay point in the hope that that relay point will have a better idea where the IP address is located and pass it along appropriately.
This makes the network fault tolerant. If a communication link goes down, a relay point notices and sends requests along a different path. If a relay goes down or becomes overloaded, traffic is routed around it.
A relay point needs to keep track of availability of all links to neighboring relays, have a rough idea where each relay is in relation to the desired IP address, and how busy each relay is. This helps avoid network hotspots when relays become overloaded.
Thinking about all the decisions and processing needed to make this mapless system work makes your head hurt. When I pinged Google.com, Windows sent 4 pings and told me:
Approximate round trip times in milli-seconds: Minimum = 38ms, Maximum = 47ms, Average = 42ms.
It took my PC an average of 42 thousandths of a second to send a message to Google and get a reply. How many hops did my ping make? Windows cmd.exe doesn’t have traceroute, but my Linux computer does. The ping made 10 hops to get to Google!
Intermediate points know the IP addresses of all their neighbors and don’t need DNS, but even so, 10 hops out and 10 hops back in 42 thousandths of second is amazing. Not only that, since network and computer loads change constantly, each ping may take a different route.
Firing packets off in all different directions and hoping they’ll get there isn’t as random as it sounds. Prof. Stanley Milgram's study "The Small World Problem" shows that any two people in the United States are connected by approximately three friendship links, on average. Internet packets link a lot faster than people do, so even when a packet needs 10 links to get from me to Google, it moves fast enough most of the time.
Large Messages
Ping packets are less than 100 bytes of data. What happens with a bigger transfer such as a complicated web page or a video? A requesting computer tells the server how big a packet it can receive. The server breaks the message into blocks no bigger than that, numbers each data block, wraps a packet header around the data, and sends each packet on its merry way through the maze.
Each packet can take a different route, of course, so they may arrive out of order. The receiving computer sorts the packets, asks the server to re-send missing packets, assembles the message, and displays it in your browser.
Streaming is a bit different in that the receiver doesn’t wait for all the packets to arrive before starting the video, but the principle is the same. Your computer asks for a re-send of missing packets and pauses the video when the next packet hasn’t arrived. The maximum packet payload is currently 1,500 bytes, so it takes about a half-million packets to stream the 700 million bytes in a typical hour-long You Tube video.
traceroute tells me that You Tube is 10 hops from me. At 10 hops per packet, an hour-long video will need 5 million hops. The slowest link along the path sets the speed at which You Tube can get video to me. My Internet Service Provider will give me more network speed from their office to my home if I pay more money, but that may not give me faster video depending on what’s between me and You Tube.
Recycling IP Addresses
32 bit IP addresses allow for a theoretical maximum of 4,294,967,296 different addresses. The actual number is less than that, however, because many addresses are used for special purposes. Addresses beginning with 192.168 are non-routing addresses which are not used in the public Internet, for example.
4 billion computers seemed like enough at the time, but we’re running out. Given that many Americans have multiple Internet-capable computers, smart phones, and streaming TVs, we’d have run out long ago if each device had to have its own unique address.
My home has 3 smart phones, a network printer, 6 laptops, and 5 towers. They all share the same public IP address through the magic of routing.
If I issue the ipconfig command in my laptop Windows cmd.exe window, it tells me that my wireless devices are not configured because I use a cable to connect to my router. The Ethernet adapter has an IPv4 address of 192.168.1.2, saying that I’m using the IP protocol version 4. It also lists a “Default Gateway” of 192.168.2.1. My Linux box has IPv4 address 192.168.0.14. The adjacent laptop has the same gateway address but a different IPv4 address 192.168.1.5.
I can browse my Linux box by entering its IP address in the browser bar: https://192.168.0.14/index.html. The router knows that by convention, 192.168 isn’t a public IP address so it finds my Linux box without going out through the gateway. My Linux box is set up as a web server, so I see its “coming soon” web page.
When I browse google.com, the browser sends a DNS message asking for google.com’s IP address and sends a request to 172.217.3.101 because that’s where my nearest DNS server says Google is.
This is where router magic takes over. My simple home router knows that 172.217 isn’t local and sends the request out the gateway. The gateway has a public IP address which my ISP rents to me. The request goes to the ISP relay which sends it on.
Google sends the reply back to my ISP. My ISP recognizes the IP address they rented to me and sends it down one of its cables where my router sees it.
When my router gets a response packet from Google, how does it know which of the many devices on my network should get it? Knowing that it must forward reply packets to the requesting device, the router inserts information about the originating IP address in every outgoing request packet header.
Google ignores this information because it doesn’t care whether it’s talking to my PC or to my Linux box, but it’s careful to copy it into the reply packet header along with the packet number and data. This tells the router which local IP address should receive the reply and the packet number tells my PC how to order the data to assemble the reply packets into one large block.
This routing decision is fundamentally the same as a relay computer deciding which of its neighboring relays should get a packet, but this time, the packet will be consumed by the destination computer instead of being forwarded. From the router’s point of view, the destination could be another relay.
Assigning IP addresses
You can configure a windows computer to have a fixed IP address. It would be a lot of work to do that for all the devices in my home. The router would be very unhappy if I gave two devices the same address, and I’d have to reconfigure all the IP addresses whenever I took my laptop or smart phone to a new place. The solution is for the router to support the Dynamic Host Configuration Protocol (DHCP).
When it connects, each device sends a broadcast request looking for any DHCP server. If there is one, the DHCP offers the device an IP address and tells it the local gateway address. The DHCP server is programmed to make sure that each IP address is valid for the router and that each device has a different address.
Addresses can be lost when the power fails. When the lights go back on, the router and devices engage in a flurry of requests so that every device once again knows the gateway and has its own IP address.
Big Networks
There are several “non-routing” IP address blocks. Anything beginning with 10 or 172.16 by arbitrary convention won’t route. My small home router uses 192.168. Except for a few reserved addresses such as the gateway and broadcast address, the unused 16 bits in 192.168.XXX.YYY can have 65,536 different values.
My home router could in theory support more than 65,000 devices, but its processor can’t handle much more than 200 million bits per second. That sounds like a lot, but isn’t much when spread over 65,000 devices.
Businesses with large networks use full 32-bit addresses for devices in their internal private network. They need smarter, faster routers and their own DNS server.
The DHCP protocol lets each device specify its assigned name. Enterprise DHCP servers share device names and assigned IP addresses with the local DNS. Organizations name devices on their networks so maintenance people can find them at whatever IP address they happen to have at any given time.
Enterprise DNS servers cache IP addresses for servers such as google.com. A business pays employees by the second, so getting IP addresses fast when an employee browses has economic benefits.
My home router knows that any IP address not beginning with 192.168 is not on my home network and sends the request to the public Internet via the gateway. That won’t work in a large network because internal IP addresses can match external address. In that case, the local DNS has flags indicating whether the IP address is local or public. If it’s local, the router sends packets to it; if it’s public, the router sends packets to the gateway which forwards them out to the public Internet.
Complex Routing
Internet routing gets tangled. Consider pinging https://www.cncf.io/, a web site for “Building Sustainable Ecosystems for Cloud Native Software.”
C:\Users\wat>ping www.cncf.io
Pinging fe3.edge.pantheon.io [23.185.0.3] with 32 bytes of data:
Reply from 23.185.0.3: bytes=32 time=12ms TTL=58
The DNS server maps the domain name www.cncf.io to fe3.edge.pantheon.io. Browsing that domain gets https://pantheon.io/, a “High Performance Hosting & Agile WebOps Platform.”
As a hosting company, Pantheon rents routers and computers which serve many domain names from the same IP address. When I browse www.cncf.io, the request request goes to the IP address associated with fe3.edge.pantheon.io but the request is flagged as being directed to www.cncf.io.
The router at fe3.edge.pantheon.io knows from the www.cncf.io domain in the request which computer in the fe3.edge.pantheon.io cloud should get the request. When the request gets to the computer which hosts www.cncf.io, the host uses the original www.cncf.io URL to decide which of many programs running on the same host should handle it.
When it got my browser request for fe3.edge.pantheon.io which didn’t include a customer’s URL, the router sent me to whichever computer serves the Pantheon home page.
Pantheon sees all network traffic for all its customers. Would a crooked Pantheon employee want to sell such information to a customer’s competitor? Perish the thought!
Network Bullets
- The Internet operates without a map. Each relay computer sends each packet to a neighbor which it believes is most likely to be closer to the destination IP address.
- Relay computers know when their neighbors are overloaded or unavailable and route packets differently in response. That’s how the Internet keeps working when communications links or computers fail.
- 4 billion IP addresses is not nearly enough. Routers recycle Internet IP addresses by creating private networks of computers which all share a single external IP address. Reply packet headers contain the information a router needs to decide which of its internal IP addresses should receive each packet.
- Since all the computers behind a router share the same external IP address, a cluster of microservices can share a common IP address which doesn’t change. A calling computer sends service requests to the same IP address without caring how many service computers are started and stopped as long as the router at the IP address knows.
- The most common practice is to have devices broadcast DHCP messages asking for IP addresses when they connect. Enterprise DHCP servers pass the device name and IP address to the enterprise DNS server so network admins can find devices by name.
- Large private networks use full 32 bit IP addresses. Their DNS servers cache popular addresses and return flags telling whether an IP address is internal or external. Requests to internal addresses are sent to the address. External requests are sent to the gateway which forwards them to the public Internet.
A cloud consists of many computers which are connected by a fast, flexible network. This article has covered the basics of networking and the next will explain the basics of having thousands of “computers” in a cloud.
After that, we’ll be able to explain just why cloud security is so hard to achieve.