Weissman's Way of Understanding the Facebook Outage
Professor Jonathan S. Weissman

Weissman's Way of Understanding the Facebook Outage

Let’s say there’s this comedian. We’ll call him Bob. Bob is going to put on a show in some random location, oh I don’t know, I’ll just say… Shippensburg, Pennsylvania. Bob wants people from all over to come to… Shippensburg, Pennsylvania to see him perform.

Let’s also imagine that the maps of the world weren’t static, and locations like Shippensburg, Pennsylvania constantly needed to advertise their presence to highways. One highway hears the advertisement of the zip code of Shippensburg, Pennsylvania, and sends that information along with directions to get there to other highways, who in turn keep sending the information and directions until it reaches, oh I don’t know, I’ll just say… Rochester, NY. In this Twilight Zone world, each highway will send information to a car’s GPS, based on their destination, directions for the next highway to connect to.

Let’s also imagine that in order to get to a specific location in Shippensburg, Pennsylvania, you needed to ask someone local, who, on your behalf would go to some specific people in Shippensburg, Pennsylvania, who know the address of every place in Shippensburg, Pennsylvania.

For example, if you wanted to go to a bank in Shippensburg, Pennsylvania, you’d ask your messenger (who is local to where you are) to actually go to Shippensburg, Pennsylvania, and ask someone there for the address of the bank. If you wanted to go to the supermarket, same thing. You’d ask your local messenger to go to Shippensburg, Pennsylvania and ask someone there for the address of the supermarket. Certainly, if you lived in Rochester, NY, and wanted to see Bob perform at the Shippensburg Comedy Club, you’d ask your local messenger to go to Shippensburg, Pennsylvania and ask someone there for the address of that location.

Now, imagine that someone in Shippensburg, Pennsylvania hit the wrong button, which stopped advertisements about how to get to Shippensburg, Pennsylvania from propagating across the highways of the country. Now, if someone in Rochester, NY needed their local messenger to go Shippensburg, Pennsylvania to ask someone there how to get to the Shippensburg Comedy Club, that question would never make it there, because no highways in the country know how to get to Shippensburg, Pennsylvania, now! The highways won’t be able to give directions to Shippensburg, Pennsylvania to any cars headed there. As a result, your messenger wouldn’t be able to get to Shippensburg, Pennsylvania!

That’s what happened on Monday, October 4, 2021, to Facebook. It appears that a Facebook network engineer accidentally stopped the navigation system of the Internet, BGP (Border Gateway Protocol), the routing protocol of the Internet, from advertising on behalf of some of Facebook’s networks (I’ll explain why this brought everything down shortly). It was a misconfiguration on a router of Facebook that stopped advertising about the existence of some of Facebook’s networks.

Think of a protocol as a set of rules that allow to systems to communicate, including the syntax and format of the messages.

Routers allow networks to connect to other networks just like highways connect cities. With this misconfiguration, the Internet backbone routers did not know how to get to certain networks of Facebook, just like in our story, the highways couldn’t get you to the borough of Shippensburg, Pennsylvania.

Now, let me tell you what was on these networks that couldn’t be reached anymore.

DNS (Domain Name System) is the network protocol that takes an FQDN (fully qualified domain name), like www.facebook.com, and turns it into its corresponding IP (Internet Protocol) address (which is a unique numerical address that identifies a specific system on a specific network), just like Shippensburg Comedy Club would be turned into its street address. Because of the BGP misconfiguration, Facebook’s DNS servers were no longer able to be accessed. They were on the networks that were no longer accessible. The Facebook DNS servers are like the people in Shippensburg, Pennsylvania who were able to give out the locations of all places in Shippensburg, Pennsylvania. Your machine asks a DNS server local to you to act as its messenger (like in our story), and do some heavy lifting to eventually get to a Facebook DNS server. During the outage, local DNS servers couldn’t get to the Facebook DNS servers, just like in our story, your messenger couldn’t navigate the highways to get to Shippensburg, Pennsylvania, to ask someone there for the location of the Shippensburg Comedy Club.

As a result, anytime someone wanted to access a resource of Facebook, Instagram, or WhatsApp, the DNS queries didn’t get DNS responses, and nothing loaded.

Other Facebook networks and resources actually were accessible, but that didn’t matter, because without DNS, everything Facebook and Facebook-related was inaccessible.

Although there have been successful attacks in the past against BGP, this appears to be a misconfiguration on Facebook’s end. There have been numerous incidents of BGP misconfigurations in the past, too. BGP is, arguably, the most important protocols of the Internet, but it is also one of the most vulnerable.

https://www.cnet.com/news/how-pakistan-knocked-youtube-offline-and-how-to-make-sure-it-never-happens-again/

https://www.zdnet.com/article/bgp-spoofing-why-nothing-on-the-internet-is-actually-secure/

https://www.bgpmon.net/turkey-hijacking-ip-addresses-for-popular-global-dns-providers/

https://www.techrepublic.com/article/outages-on-facebook-linkedin-paypal-and-other-sites-might-point-to-bgp-failures/

https://securityintelligence.com/bgp-internet-routing-what-are-the-threats/

https://www.bleepingcomputer.com/news/security/us-payment-processing-services-targeted-by-bgp-hijacking-attacks/

https://nakedsecurity.sophos.com/2018/10/30/china-hijacking-internet-traffic-using-bgp-claim-researchers/

https://www.zdnet.com/article/oracle-confirms-china-telecom-internet-traffic-misdirections/

https://www.zdnet.com/article/persian-stalker-grayware-targets-telegram-instagram-users/

https://arstechnica.com/information-technology/2018/11/strange-snafu-misroutes-domestic-us-internet-traffic-through-china-telecom/

https://arstechnica.com/information-technology/2018/12/how-3ves-bgp-hijackers-eluded-the-internet-and-made-29m/

https://securityintelligence.com/why-you-need-a-bgp-hijack-response-plan/

https://www.bleepingcomputer.com/news/security/major-bgp-leak-disrupts-thousands-of-networks-globally/

Security solutions like RPKI (Resource Public Key Infrastructure) and S-BGP (Secure BGP) just haven’t gained any traction.

https://www.forwardingplane.net/2016/05/bgp-rpki-why-arent-we-using-it/

https://queue.acm.org/detail.cfm?id=2668966

In fact, it was reported that some Facebook employees couldn’t even enter buildings because badge access was also down due to this whole mess, which is one of the reasons the problem lasted for as long as it did (11:50 am EDT – 5:20 pm EDT). They couldn't get in there to fix the router configurations! It was also reported that Facebook’s internal workflow platform Workplace wasn’t able to be accessed, which meant Facebook employees couldn’t even do their jobs.

There were collateral damage issues as well. For example, DNS servers of other organizations were seeing double the regular load of traffic, as everyone around the world kept repeatedly trying to load Facebook, Instagram, and WhatsApp.

People who use their Facebook credentials to log in to certain sites, couldn’t do that during this time.

On a positive note, I’m sure work productivity hit an all-time high at some places of business, and more students got their homework done.

This is coming on the heels of Sunday night’s 60 Minutes. On the program, a whistleblower, a former Facebook data scientist, Frances Haugen claimed that Facebook's own research showed that it uses hate, misinformation, divisive content, and political unrest to keep users engaged, spending more time on the site and in turn clicking on more ads and sending more money Facebook’s way.?She testified before Congress the following morning.

But wait, there’s more. During the outage, which lasted from 11:50 am – 5:20 pm, shortly before 4:30, information on 1.5 billion users was put up for sale on a hacker forum.

In less than 24 hours, a whistleblower exposed Facebook, it went down, and information on 1.5 billion users was put up for sale.

Some estimates of how much money Facebook lost due to the outage are at $60 million, although I saw some as high as $100 million. Facebook makes over $160,000 every minute. Facebook shares dropped 4.9% on the day of the outage, and cost Mark Zuckerberg $6 billion, dropping him from fifth richest person (according to Forbes) to sixth, behind Oracle’s Larry Ellison.

This is quite the welcome to National Cybersecurity Awareness Month (which should be every month, if you ask me)!


Follow me on LinkedIn: https://www.dhirubhai.net/in/jonathan-s-weissman/

Follow me on Twitter:?https://twitter.com/CSCPROF

Follow me on Instagram:?https://instagram.com/cscprof/

Subscribe?to?my YouTube channel:?https://youtube.com/user/Weissman52

Check out my Amazon page:?https://amazon.com/author/jonathansweissman

Aiden Rall

Student at Finger Lakes Community College

1 周

I find this to be interesting because the outage did not seem too bad for the consumers point of view, because you might think it's going under for repairs. Facebook on the other hand was losing millions of dollars, and a hacker was selling user information. Facebook was also using an algorithm that spreads hate, misinformation and etc. to keep the consumer engaged. In an event like this were to happen again there so be countermeasures to prevent it.

回复
Anna Snyder

Student at Finger Lakes Community College

1 周

Interesting analogy comparing Shippensburg, Pennsylvania to Facebook’s network. It's alarming how even a small mistake can cause such widespread problems. Society really relies on these systems heavily in daily life. The fact that an error like this can affect physical access to buildings and internal workflows is alarming, especially considering the reliance on RFID. This incident highlights the critical need for strong cybersecurity measures and protocols.

回复
Preston Van Dusen

Student at Finger Lakes Community College

2 周

I know from experience that no one notices the sound guy until he makes a silly mistake and positively deafens everyone in the room with mic feedback. By the sounds, it's a similar story with networking, one silly mistake by an individual (whose job typically wouldn't be noticed by anyone) can cause countless headaches for countless people. Anyway, it's always good to be reminded of the dangers of having such a connected world, helps keep us aware.

回复
Sean Waweru

Student at Northside High School

5 个月

Your analogy of how Face book lost their their connection to the BGP ??

Corey Collett

Student at Finger Lakes Community College

1 年

Great use of analogies in the article! It is amazing that Frances Haugen came out about Facebook's tactics in order to increase ad revenue. All because of a router mistake advertising went down for just a few hours but costed Facebook millions. This sheds some light on how much social media like Facebook, rely on our addiction to it. Very interesting and informative read.

要查看或添加评论,请登录

Jonathan S. Weissman的更多文章

社区洞察

其他会员也浏览了