S1E2 Disaster.Stream Newsletter

S1E2 Disaster.Stream Newsletter

Audio Link

Video Link

Disaster.Stream Podcast Website with Audio, Video and Transcripts

No alt text provided for this image

SlideShare Slides

Full Text with timestamps

S1E2 911 Disaster Recovery Responder Story

[00:00:00]

Hi, I'm Bill Alderson coming to you from Austin, Texas, right here in the heart of the country. I just have one message for you. It's more [00:00:30] fun to be ready . When disaster strikes. It's really good to know that you are ready to respond. If disaster strikes and you're not quite ready, it's not the time to be judgemental.

Just move on and recover well and recover fast. Lessons learned. Make sure that you record all the lessons learned while you're going through so you can make a profit on the disaster that's occurring. All right. [00:01:00] Today we're gonna talk about the 9/11 disaster. Yes. I was at home on a Sunday afternoon just after, 9/11, and I got a call from a Pentagon General on my cell phone.

Amazing. Here we are watching television glued to our television sets, watching what's happening around the world after the 9/11 events and the disaster that occurred, to our country. And I get a phone call. I'm ready. It's awesome. [00:01:30] The Pentagon General asked if our team could come in and help them recover communications at the Pentagon and it's good to be ready.

We responded, jumped on planes, moved back to the Pentagon, got escorted in armed guards surrounding the Pentagon, the building was still smoldering at the time, water damage everywhere, and they had moved hundreds of servers and a lot of stuff just was not working. Key links [00:02:00] and key capabilities were missing.

The Pentagon was having some trouble, we did get a chance to go in and it was our honor to go in and respond at this time. But I will make one, thing known. We were ready. We are forensic analysts, network forensic people, and knowing about best practices. And we have learned from all of the things that we've had to troubleshoot over the years.

Here we are in the Pentagon, [00:02:30] recovering communications. Here we go. Let's take a look at this. First of all, I've got a number of really cool exhibits that you might not have seen before about where the plane hit and how it hit I'm gonna try and rush through some of these things because the key points are something I want to focus on.

Before, I wanna make sure that you know that we are here not just to tell our story, but our primary, long term is to [00:03:00] tell your story. We wanna find out what you as a planner or a responder to emergencies or as part of a team, what you learned from the, security incidents or other disaster recovery incidents, what you learned.

So we pull out those case studies of what you've learned, and then apply them and try and put them into best practices so that you can implement 'em. Because guess what? It's more fun to be ready, we're gonna talk a little bit about what happened [00:03:30] at the Pentagon. But before I go into talking about what happened at the Pentagon, I want you to understand that this broadcast, I introduce you to some of my resources, my friends, my peeps in the industry that help us , know more, understand more, and are resources to us when we're, stuck or disaster strikes.

We have a bunch of resources so that we can reach out and get some advice from very capable people. In the world and [00:04:00] organizations. Now, the first person that I want to talk about, just, keynoted our conference at the Austin Cyber Show, and he talked about hope is not a plan. Yeah. When the balloon goes up, it's too late.

You have to act quickly and you have to have been prepared. Now, Colonel David Wills was the chief network, engineer at US CENTCOM. Now, if you understand the military environment and when Dave talks, in his, about a [00:04:30] 35, 40 minute address during our keynote, he talks about how the military is divided up into different parts of the world and different types of, combatant commands.

So you do understand US CENTCOM, they take care of, the central, Asia area, and they took care of both the Iraq and the Afghanistan Wars. So this guy was the one who was, the chief engineer over all the networks that went in to support [00:05:00] both of those war efforts. Now, after he did that, he went to the JOINT CHIEFS of staff at the Pentagon directly, and he took care of that network for, about 4,000 plus, people who work for the JOINT CHIEFS of staff.

Now you have Army, Air Force, Navy, Marine, those guys have specific jobs. And then you have the JOINT commands. The JOINT commands offer and use resources from all the different parts of the military. That's why you have an [00:05:30] Army officer like Colonel David Wills taking care of CENTCOM. At JOINT CHIEFS and at US Strategic Command.

Those are all JOINT commands. Those JOINT commands are over the entire military or really coordinate for the entire military. And when you are in a JOINT command, there's Army, Air Force, Navy, Marine, every part of the military and I wouldn't be, doing justice not to talk about the Coast Guard who takes care of our [00:06:00] homeland.

I'm gonna, introduce you to David Wills in a little bit, and he's going to address you for a minute or so, , with a video that we produce during the keynote. And then I'm going to give you, in the show notes where you can go and listen to Dave for 30 minutes.

Talk about, a number of different things, the wars in Iraq, the, JOINT CHIEFS of Staff and US STRATCOM. Now, if you don't know what US STRATCOM is, it's very important they take care of all US [00:06:30] government, nuclear systems, development and deployment. So those are the guys who are making sure that we are ready in some very important ways.

I hope you have a cup of coffee or a beverage to enjoy while you're listening to this. Maybe you're driving on your way to work or on your way home from work, or you've chosen to show this to your staff, to your team as a team building exercise, [00:07:00] whatever the thing is, we're going to bring some very cogent information to you.

And we really hope that you follow us and work with us and participate by bringing us your stories as we go through. Now, the second person I'm gonna introduce to you, may not need much of an introduction. His name is Gary Hayslip and he and four of his co-authors wrote this book called The Executive Primer and it's the Executive's Guide to Security.

[00:07:30] Here in his executive guide, they wrote multiple books, but here in this particular book, he writes about how to work with your Chief Information Security Officer, how to interact as a board as a leader, as a company officer. And also as a subordinate, how to work with and get along with and get the most out of your relationship with your Chief Information Security Officer.

So those two people I'm introducing you to today, I will provide a link [00:08:00] to a much longer version of their story and their information subsequent, but I just pop in a little tidbit to help you understand who they are. Now, when I went to the Pentagon with five of my team members, we stopped everything obviously.

And we went to the Pentagon. Now, a year later, at the anniversary of the Pentagon Disaster and the 9/11 disaster, all the news networks [00:08:30] did these pieces on who responded. So we were chosen, in the ABC News, Sacramento Market to, be interviewed. I'm a pilot and so these guys wanted to see what I did in live action.

So we flew from Sacramento to some of my customer environments. Yes. I couldn't believe it, but the whole ABC news team jumped in my Bonanza and we flew down to some of our customers and did some, recording that was a [00:09:00] lot of fun. Anyway, that video I'm gonna play for you.

?It's pretty short. It's just a couple of minutes and they did a really good job of telling the story. So I hope that you enjoy it.

For one high-tech company here in the valley, the events of 9/11 brought the greatest change ever in its history, a call to service that led them right into the ruins of the Pentagon and the job that they did there help speed the recovery at the nerve center of the US military, and to get the war on terrorism up and running.

Dave Marquez [00:09:30] reports.

Goosebumps , my hair standing on end wondering, what my country's about to ask me to do.

For, Bill Alderson challenges usually come without warning, but the Sunday afternoon call from a Pentagon General still came as a shock.

We need the best company in the world that doing critical problem resolution. And he says, everyone's told us that you're the company.

When flight 77 hit the Pentagon, much of the damage came at the heart of the U S [00:10:00] Army's computer network. And the toll on human lives was far worse.

One of the most tragic things that happened. The gentleman who was in charge of the Army's part of this network, the airplane apparently flew through his window. So they lost many critical personnel.

Clear!.

The next

morning, Alderson and five top engineers were on their way to Washington. They will never forget. Two-fifths of the Pentagon was gone. Computers, servers, an [00:10:30] entire network had been shattered. Its remains reassembled in another part of the building.

But after 11 days it was barely working. The Pentagon could hardly talk to itself.

Those are the sort of moments that you prepare for all of your life to be ready

Alderson and his engineers

went to work, searching for bottlenecks and broken connections in a maze of systems whose online documentation was mostly missing.

Internet firewalls routers, VPNs VLANs switches.

The company is to computer networks. What a forensics expert is to a murder [00:11:00] case. Trying to decipher clues that will solve a mystery others have given up,

I basically

try to get a three-dimensional view of the technology I Tron into these systems and try and figure out how they're working.

Like others at the Pentagon, he and his engineers working under extreme pressure.

They had to get up every day and decide to move themselves into harm's way to go back to that building, which could still be a target.

His team began quickly finding the bottlenecks.

We did an [00:11:30] optimization here, increased it, and then we found another problem and increased it.

One important data link soon improved by six times. And within days, the system was back up and running near capacity to Alderson and others at the Pentagon, getting things running normally was the best way to answer back.

We should be moving on with life as usual or even more so in the face of danger. That's what Americans are about.

He had just company are ready for the next call. Until then Alderson believes answering the [00:12:00] threat of terror means living as we have always lived.

Our

retaliation

is going out and doing what we always do and that's the best retaliation. And that's how we're going to overcome.

In Folsom ,Dave Marquez News 10.

And

they do it very well. And by the way, that next call did come, Bill Alderson and his team recently returned from another troubleshooting trip to the Pentagon. And they're ready to go back again when they're needed.

Great work from Bill Alderson and his team.[00:12:30]

Okay. Real quick, because it's a new podcast, I want to introduce you to myself a little bit. I've been doing, publications. I write reports like the Solar Winds Report. You've probably heard about the Solar Winds breach in this. I have color diagrams of how the breach occurred each step, the 11, I call it the 11 evading steps and how we as victims got caught in this, and [00:13:00] how we can gain lessons learned from that type of event so that we don't have that occurrence again.

?I've also written for publications and I've done trade shows and, , actually I did the Forensics Day events at Net World Interop for a number of years. So we got involved and we know a lot of folks and we trained, started training thousands of people in computer network diagnostics and computer network forensics.[00:13:30]

We ended up having the default leadership in that and we, thousands of people created a certification program called Certified NetAnalyst, where we certified over 3,500 of the top security and forensic people in the world. Deep Packet inspection. Absolutely understanding the technology from the client to the server, the application, all points between and all the security components between and how the protocols and systems [00:14:00] work to deliver that information.

Wrote, a bunch of stuff called on the Wire because that's where my focus has been. So I'm very involved with the Security Institute and the Issa organization. Okay, that's a little bit about me. Let's move on to, some, understanding of data crisis. That's been my focus. It doesn't matter what kind of disaster you have, typically it involves some data [00:14:30] or, some sort of problem getting access to data like the 9/11 problem.

My last episode, I started with was on the US stock market denial of service and how we addressed that particular problem very successfully and we brought up all the US stock markets after they were almost completely down because of a distributed denial of service. This is really where the rubber meets road, you'll learn a lot of stuff and at an [00:15:00] executive level, board level, and also as a technology.

It's a lot of fun. In the future, we'll be doing additional stories that we have responded to, and one of the things I like to tease people with is, how important is this? Let's take a look at Facebook. October 4th, 20 21. Facebook's network. Mark Zuckerberg's network went down for about four to six hours.

During that time, they lost about 5% of [00:15:30] their stock value, which was about 25 to 50 billion. Now, if you want to talk about roi, I have the exact best practice from lessons learned long ago at AOL another company, America Online that you might not remember, that, , basically brought us a large scale commercial internet that the average person could get in touch with.

So I learned some things there in troubleshooting that environment. [00:16:00] That if Facebook would've done the same best practices, they wouldn't have had that downtime. And it's very powerful. So we're not talking about yester year, we're talking about right now, and the potential for saving billions. Yes, that's billions with a b of dollars and lowering the time it takes to recover from a communications disaster.

So that's what I've been, doing all my career. I feel a little bit like Forest Gump, who just ends up [00:16:30] in places I never thought I would be. But here I am. And I am sharing with you the lessons learned in helping you prepare for the potential of disaster by gathering all of the lessons learned and helping you impute those into your organization.

And gather those lessons learned. I'm a good friend in time of need and I really love that relationship and I always tell everybody it's more fun [00:17:00] to be ready.

Now I do talk a little bit about the disaster recovery timeline, and the fact that you can make a disaster, an opportunity for growth, and if it befalls you, whether it's a security incident or other, the type of attitude to have is, how can I make this an opportunity for growth? It's not time to look back, it's not time to flog all of your people.

It's a time to, [00:17:30] learn and to gain, an opportunity for growth. You start out with this timeline. I'll talk about that in future sessions, but just generally you can see that you have known risks. While those known risks can have a prodromal build and that can then end up with an acute and chronic crisis or disaster, which you then need to triage, you need to minimize and operate.

You need to diagnose the problems. You need to mitigate the problems, and then you [00:18:00] need to recover and you need to recover rapidly. And you need to recover well. So my tip for you is to make sure you capture the lessons learned. Where did I learn that? I learned it. Over on the right, you see the discovery.

Disaster recovery teams, the critical problem resolution team. I call that a tiger team or a CPR Team. Where you go in and you build the best people from all the various organizations, all the various disciplines, physical [00:18:30] security, data security, all the different aspects of your business. And then you form a team from all the best people.

And that team addresses this problem. And if you're prepared and you have that team ready to go, it's much better. A lot of times you don't know what kind of a disaster you're gonna have, obviously, but you need to have a few people aligned up so that if the disaster strikes, you can take care of that.

Also, you can go back and look at [00:19:00] lessons learned and make certain that you have the systems and the communications training of your, your team, for this sort of thing is like having a preview of what's gonna happen by running some scenarios. And those scenarios are key to helping you learn what to do, learn where you're, not prepared, and then prepare better.

Now, I've been known for what we call peeling the [00:19:30] onion. Every time you peel back a problem, it just, it exposes yet another problem, and then you have to troubleshoot that problem. And then there's another problem. finding root cause is about basically assuming that there's multiple problems in every situation and that you're not gonna have one magic bullet.

And that I've learned through, following the wrong things and saying, oh, eureka, no. I'm very careful about, coming to a conclusion too quickly. But [00:20:00] we have to be in a mindset of iterate and analyze and then diagnose, fix the problems. Move on to the next problem. So you need to have a system to make certain that you have a philosophy and an understanding that, disaster recovery is an iterative, task.

There's a lot of different things happening. So you wanna make sure you record those things, identify the lessons learned, build them into best practices, and guess what the ultimate in your [00:20:30] credibility as a, professional disaster recovery, data professional is crisis avoidance. If you don't have a crisis because you've prepared and your best practices prevented it, that's the very ultimate in credibility is not to have a problem.

?The fingerprint of mission critical, every company, every organization is different. I don't care what you say. Used to be we called in IBM and they [00:21:00] took care of everything computing wise. One vendor, one phone call, one belly button to talk to. That changed with the advent of the PC computer networks, distributed computing, the promise of distributed computing.

And here we are, but how did we get there? That's your fingerprint, your DNA of your enterprise. You cannot take what company A, B, or C did and just simply apply their formula. It just doesn't work. And if [00:21:30] you think that's what you need to do, that's why you probably go through, you know you, you got a new CIO and then something happens and then you get another CIO and then another CISO, and you keep flipping.

The problem really is, Your mission critical enterprise is unique to your organization. You need to study yourself like Sun Tzu and the Art of War. Know your enemy, absolutely, but know yourself better. You have to document your system so that [00:22:00] you can train all of your people to be ready for a disaster when that happens.

That's my preamble and what I'm talking about to get to this slide that says, look, if you take your lessons learned from other people, oh, it's much better to learn lessons that other people experience and then apply them to your situation so that you don't have to experience them yourself.

And that's what this slide is about. [00:22:30] This slide talks about best practice amplification. Your organization is gonna take those high fidelity, low noise inputs and then amplify them through your, leadership and executive, functions to impute and apply those best practices so that you get tangible results. Now, you may be spending money like a drunken sailor on products and that sort of thing, and continually overrunning your budgets not a good thing.

[00:23:00] Yes, you do need significant budget to run these kind of programs, but the best things to do are the essentials, the fundamentals, and using lessons learned. And those best practices are the best fundamentals and they. Free . Yeah. , what a concept. Good system management and fundamentals are key to being prepared, so make sure you find those, , key lessons.

Amplify them [00:23:30] into your organization and receive the tangible results. Now, if you're a large organization, you might need some help. McKinsey, Boston Consulting Group, Bain, Deloitte, Booze, you name it, Accenture GDIT. Somebody may need to help you impute those best practices, but taking those free best practices and making sure that they are well integrated and imputed into your people, your processes are going to help you [00:24:00] recover much faster, and you're gonna save a buck because a lot of times the fundamentals are what weren't done.

Yeah, you got all this esoteric software, esoteric systems, artificial intelligence out the wazu, but what happens, you still have this problem of making certain fundamentals are taken care of, and that's one of the key things that I'm here to help you learn, understand, and then build out those best practices.

[00:24:30] Okay, disaster.stream, that's the that's the site that I use to talk about this particular, , Disaster Recovery Responder Stories. So you can go to disaster.stream. So you can see additional information, those videos I told you about my friends and, , associates in the industry that you can learn from.

So here I've got a collection and I'm, and I've got blowups of these in subsequent slides that I'm gonna go over, but I just want tell you [00:25:00] what's coming. First of all, I'm gonna go over the organization layout inside the Pentagon that got hit by the aircraft as it came in. Down here you'll see, , where the aircraft came in, hit the building right there.

Some of the other things like the heliport and that sort of thing, just to help you understand the big picture of what happened. These are some pictures of actual video that were captured by cameras at the Pentagon. And [00:25:30] so you can see here the. The aircraft coming in, it's zoomed here, and then boom, you see when it hit.

So for all those folks who might be, , non-believers that the event actually occurred, yeah it occurred and here's a little bit of proof for it. Here's a bigger picture of the approaching aircraft. Came in and hit to understand a little bit about the background. You may have heard that the [00:26:00] Pentagon had some renovations and it just finished this part of the the Pentagon being recovered.

They spent a lot of money on construction, on new, , reinforced, , concrete and other such things. So the fact that the aircraft hit in an area that had just been recovered, Or just been renovated was actually serendipitous. In addition to that, yes, there was a lot of, , lives, , caught and people killed in this [00:26:30] particular part of the, , 9 1 1 disaster.

However, it could have been much worse because they had just finished the renovations and people were just starting to move back into these new office areas, so there weren't as many people there that day because they were just starting to move back in after the renovation. Okay, so I hope that helps you understand a little bit more about that.

Now, here is the track of the aircraft in through the [00:27:00] organization, so you can see that it hit square into the army's part of the Pentagon in this particular area. It knocked out a number of key, , people and systems. So you have to keep in mind that sometimes when you're doing disaster recovery, you're not gonna have your entire team.

So your team needs to be, , trained to lose a few people here and there, and then figure out how you're gonna backfill those positions if a particular [00:27:30] disaster occurs. Also, like I was mentioning, if it had hit somewhere else, it may have taken out several key, , single point of failure communication points, the ingress and egress locations of data and telecommunications and that sort of thing.

Were basically, affected but not nearly as much as could. The Pentagon has multiple, , , points of [00:28:00] entry, multiple points of ingress egress, but in communications, there was some single point of failures that had the aircraft hit in different areas. We looked at this and said, wow, we would've been down for many months, recovering communications if it would've hit here, or here.

So after the event, they took our report and other reports and, EDS HP took on the, , renovation to basically put in additional redundant [00:28:30] systems. And one of the key things that they did at the Pentagon after the recovery of this was if you were in any of these areas here and you hit file, save on a document, or you got a phone call or any of those sort of things that were data oriented, that information was stored in the Pentagon.

And if it got hit, Boom, you're a single point of failure, and I'm gonna talk to you about a single point of failure that we experienced that really impacted [00:29:00] our ability to recover in just a moment. But I want to call out the fact that, we basically went back in and rebuilt and spent, I think 700, million dollars plus on, , creating a second, , five ess at and t switch.

, even though we had voiceover IP coming in, and now that's predominant, they put in a second switch. , Verizon, , put in multiple, , places of ingress egress for all of their [00:29:30] data, and that was a very costly exercise. And they put those single point of failures in different places around the Pentagon so that in the event something like this happened again, They had their data as I was getting to, if you hit file save, it would save it in the Pentagon, , prior to 9 1 1 after 9 1 1.

And the renovations that occurred in the years, , beyond if you hit file, save on a document or sent an email or something of that nature, [00:30:00] it saved in the Pentagon, but it also saved a hundred miles plus away at an alternative site that had the, , recovery capabilities so that people from that part of the Pentagon could go a hundred miles away and they could reassemble and all of their data was there and their operations could continue even though the event had happened.

File save saves at the Pentagon, but [00:30:30] then it automatically, , replicates, , to over a hundred miles away where a recovery site could be, put up very rapidly to bring things together. So that one thing made the Pentagon much more survivable. Subsequent, of course, it was a disaster of mammoth proportions.

, we'd never seen anything like this. never even thought of it. But that just talks about the evil and the mind of men, [00:31:00] of people. They want to destroy. And it's a very sad situation. Anyway, it's been 20 plus years now, and we've recovered from this particular thing. Prosecuted a couple of wars, spent trillions of dollars trying to, , basically stop it from happening again.

We'll see if we're successful. Hopefully that works. Here's, , a nice pick of all the brave, , responders, , going up to the roof of the Pentagon and fighting that [00:31:30] fire. And of course, during all of these times, nobody knew if perhaps there was gonna be another event, maybe maybe a second shoe was gonna drop.

We didn't know, they had all those airplanes and, those 19 different attackers maybe they were gonna have a second silo. And that's why we stopped all aircraft movement and that sort of thing for a period of several days so that we could basically improve our security around, , the nation to make sure that there [00:32:00] wasn't something else that they could exploit.

All right? Now, as you might think in in our computer networks, we have systems that Send us basically alarms and all these automatic systems like UPS's. My battery's out, boom, send an alarm, a server room that's too hot, boom, send an alarm, all these sort of things. When the [00:32:30] event went up, we started getting thousands of these notifications and alarms.

They had about 83,000 alarms a day, and sadly they didn't have enough people at the time. And remember they had just lost some folks and we weren't really sure what was going on. And here we have evidence of literally thousands of events [00:33:00] alarming to the few people who were left to recover the situation.

And that was one of the best practices. We basically helped them. Put them into different buckets of, , of sensitivity, of criticality, and then respond more rapidly to the critical alarms first. And of course, this is an ongoing battle with, , any kinds of servers and systems, especially as we are now mainly in the cloud.[00:33:30]

And we need alarms to come in and tell us what's happening so that we can then respond well. And a lot of that is, is working to be done with a little bit of, , machine learning, artificial intelligence. But this is where we really had to go to work rapidly to prioritize what do we go take care of first, second, and third.

So those were good lessons learned. Now, the second thing that happened that I, I teased you about this [00:34:00] information that was destroyed by the aircraft was the. Network and system documentation, it was gone. Why? Because it had hit some servers in the Army's part of this network and those servers were destroyed, containing all of the network diagrams.

So I said don't you have printouts of these? And it's sadly, no. There were no printouts. So one of my key things is that for disaster recovery, make certain that you have accurate [00:34:30] documents, accurate diagrams, and that those are stored at an offsite. And key to this is being able to print those things out.

It doesn't matter whether they're super large and large network or application diagram, but you need to be able to visualize and see where all your dependencies are going, and then you can troubleshoot, , along those dependencies more effectively when you have good, , system documentation. All right.

Now I want to, , [00:35:00] just introduce you, , stop the flow here for a minute and rethink, , what does a good manager do? What are we supposed to be doing as technology managers? And this is the CISO executive primer, , that Gary Hayslip, Bill Bonny and, , Matt Stamper wrote together. It's a fabulous group of books and this particular one is about how to interface and how to work [00:35:30] with, and how to best employ a chief information security officer.

So this is really great stuff right from the horse's mouth. And I will come back in just a minute or so after you've heard from Gary. Also, remember, I will provide a link to Gary's entire session. So that you can get to know him a little bit more. And that's part of the process of this, , this broadcast, is to bring you some great [00:36:00] resources and help you understand things a little bit better.

We have a whole bunch of these sort of things to bring to you in the next year. Take a listen to Gary Hayslip.

?I was asked to speak about the Executive Primer, a recent book that myself and my co-authors wrote, and we're going to discuss that in some of the topics.

To begin, the book was written with my co-authors, Bill Bonney and Matt Stamper. It's written primarily, It's very different than the other books that we've [00:36:30] written. The CISO Desk Reference Guide series and in some of the in some of the domain specific books that we've written for CISOs, this one actually is written for the CISO's colleagues.

It's written for people that actually work with CISOs, that actually work with security professionals. The book is really one of expectations. And what I mean by that is, we're looking at what expectations does the CEO. When they're working with the CISO, when they, or how should a chief financial officer support a CISO and the [00:37:00] security team?

So it was, we were trying to write it more about how people should be able to work with a Chief Information Security Officer and that, professionals security team and security program and the discuss. Even though the book has multiple chapters, I picked three domains. Three sections that I thought might be interesting, for our talk today.

And those are basically the expanding role of the CISO and the business what components are part of the Cybersecurity program that I find to be really important. And [00:37:30] then executing the security program, actually being able to be effective and being able to make sure we get things done to protect the business.

Okay, we're back. This is an example of actual reverse engineering of key systems inside the Pentagon in order to solve problems that we had. Most of you probably don't understand some of these buzz words, but I'll, they're on the screen. I'll use them a little bit. Switches, which we know switches and route.[00:38:00]

Switches have these things that are absolutely key to configuring them so that they can be redundant and have automatic failover and they block certain paths. And another friend of mine named Radia Pearlman Radia is one of these brilliant, , engineers. She worked for Digital Equipment Corporation DEC years ago and then worked for Novell and now works, , I think, , for Oracle.

I'm not really exactly sure who [00:38:30] she's with today, but she's a brilliant, , technologist and she talked about and built the Spanning Tree Protocol. So I have been, , in her sessions and learned from her over the years how to manage spanning tree. So that does not create loops. Loops in a spanning tree network will bring an entire network down.

And that's what was happening a lot of times in these environments is the network would go [00:39:00] down because there were loops in the technology. And one packet looping can bring down the entire internet, bring down the entire data center, because if they're not managed well you have to document who is the root bridge where different things are, and you have to reverse engineer the environment and diagram out who the root is.

And then there's all these algorithms that we use to basically be able to have a loop free technology [00:39:30] automatically, and those systems don't always work automatically. So we had to reverse engineer all the switches and systems so that we could figure out what was going. Here's another diagram of gateways and different systems.

We put test points so that we could test between two points to determine that we did get a good throughput between two different points after we fix things. And one of the things that's interesting is that [00:40:00] nowhere else in the world do we not do this, but, , I, I of say, Hey, if you just bought a brand new Corvette, the first thing you do is you go out, put the pedal to the metal and see how fast it'll go or how fast it'll go from zero to 60, that sort of thing.

Now it's no longer the Corvette, but it's probably a Tesla. those things are really fast, but the first thing that we do is we. If the circuit or if the car is hitting the theoretical numbers that are [00:40:30] stated. So between two points, we put things in so that we can test between those two points to ensure that we are getting the throughput that we have purchased from the various data system providers.

Okay. So we did that and we, but we had to reverse engineer the network in order to diagram these are actual diagrams that we created during the event. of Cool. Huh? We also had to find various errors and use various tools, , [00:41:00] to diagnose the problem. And so we would go out and we would find where certain errors are, like CRC areas, , errors, cyclical redundancy, check errors.

That's a big fancy word for making sure that the data that you received was the data that the sender meant to send. Yeah, that's pretty cool, isn't it? Okay, so CRC errors means that the data got corrupted in transit and when it arrived it was wrong. And when we have those sort of things, we [00:41:30] know something is err between two points, and then we can quantify that and say, yeah, that shouldn't be at all, should have zero.

And it has some, so we have to go diagnose those problems. And then we have to look at the network diagram to see where those problems, , are created along the set of dependencies. It's pretty simple if you've been there. It's not rocket science. The problem is that people who don't have experience need to be trained [00:42:00] by people who do.

And then you need to get your entire team trained in how to look at your network documentation, how to see where your dependencies are and what's broken and what's not working. One of the problems that they had after moving hundreds of servers is that their firewalls were. Misaligned. So they had, about seven firewalls 7, 8, 9 firewalls there, on this particular picture.

But we had to go look at statistics and find out why [00:42:30] some firewalls were delaying packets and what was going on. So we had these throughput charts and we'd go from, firewall one through seven and figure out what kind of traffic was going on and how we could rebalance those firewalls so that things would work better.

We had to reverse engineer these diagrams, and that's another part of the key takeaway you must have in a large organization. People who can d , basically look. [00:43:00] And respond to zero day problems down at the very basic fundamental levels. And if you just have a bunch of click or plug and you'll understand just from that term, if all they know how to do is click and install and all they know how to do is buy and plug in, you're in trouble.

You need technologists who have the theory behind the understanding so that they can reverse engineer, they can, , basically troubleshoot and look at deep packet, [00:43:30] look at security fundamentals and see why someone's trying to break in what's happening down at a detailed theoretical level. Very important.

And if you wanna know a little bit more about firewalls listen to my first broadcast on the denial of service attack on the, , stock markets. I go through in great detail what we did to solve that particular problem by using a myriad of different firewall techniques. So take a look [00:44:00] at that.

It's not super techy, but it does give an executive, like yourself or even a board member, an understanding of what kind of problems are you solving? How are you working through these? What resources do we need? What focus do we need? What training do we need? What kind of people do we need? And it helps you understand some of these things.

I try to avoid the big buzzwords . It's inevitable in a data world, but a lot of executives understand some of these things. And so hopefully these [00:44:30] exhibits will help you relive some of these things and understand what's going on. So here is an example of a circuit that was, , highly degraded, very low throughput.

We found a problem and then improved it. Now, this is where the iteration comes in. We got it improved by about 50%, but it wasn't the full improvement that we could get. That's the peel the onion. That's the fact that there's multiple problems causing these things. And so [00:45:00] you have to take an iterative analyze, find a problem, solve it like we did here, find another problem, solve it like we did here.

Find another problem, solve it until the system is working optimally, and your users and your business can, , re, re return to operation. talking about best practice. In documentation. I prepared this slide some number of years ago about the need for visualization of [00:45:30] details. And at the very top you'll see disaster recovery.

Yes. In the event of a disaster, you have to have visualization of details because you may have to rebuild a circuit, , get a secondary circuit put in. You may have to do all types of different redesigns in a disaster. And so consequently, you need the most visibility and the most iteration, , , of documentation for disaster [00:46:00] recovery.

And so I'm gonna show you some examples of some of these different types of documentation, and you can take those away and benefit from them. This is what your management, your leadership, your users need to know the basics of where your systems are connected and. , second, this is a, an application that was very slow out in California and it got worse and worse under a load.

And so data was being brought to and [00:46:30] from Boulder, Colorado, from California across very low speed links. And so I showed in the thickness of the data moving back and forth and the path and the dependencies that it was traversing. And so if you were local, Like between a server and a workstation on a local area network bandwidth is free and you can get very rapid capabilities, but the offered load was akin to what should be for a local area network, [00:47:00] but it was trying to go back across a very low speed link.

So you can't put 10 pounds in the five pound bag. That's, , what this does, this helps you understand visualizes the network and the offered load to the network for the different types of transactions. This is an example of how you might see your technology and in your, , equipment racks here and how they might be connected.

But how they are connected physically is [00:47:30] different than how they are configured, what connects to what, and we use, , technologies like VLAN and routing and other such things. And we call those layer two and layer three technologies on the OSI model. And if you're familiar with at least the term of those things, the OSI model here, we take and break out the same exact devices, but we show you the layer one and layer two, what VLAN s they go through.

And just [00:48:00] because a big switch, it has a plug in, it doesn't mean it's connected to everything. Those are logical connections based upon configuration, what is allowed to access different things through firewalls, et cetera. So you have to be able to see your system from a holistic standpoint. This is a large diagram, , reflecting of layer two and layer three technologies here.

And I've superimposed. Some of the details that you put [00:48:30] on a server, the different interfaces that you may have, various types of, , network and system configuration and dependencies. And this is even more important in a cloud environment as to how it's, , connected in, in, in basics so that you can see what your dependencies are when a disaster hits.

You need to see how things communicate from point A to point B and point C to point D. And the only way I know of [00:49:00] is the good old fashioned, , work. W O R K I can barely even say it because it is a four letter word. Work is required to diagram these systems and there's no automatic, remember how I told you that your fingerprint of technology is unique to you?

These systems are unique to you. They're unique to every organization. They lay out differently, whether you're a centralized bank or a decentralized, , aerospace [00:49:30] company or a retail, , vendor and that sort of thing. You need to take a look at your enterprise and then help your employees be able to put their finger on a diagram and move it through to see the dependencies so when there's a problem, they can diagnose it.

A lot of organizations don't train their technologists. They pay them, a lot of money, hundreds of thousands of dollars a year, and they hire some new person who was at another company [00:50:00] and they were really smart. And did a really good job. So you hired 'em, but they're not gonna come and tell you, Hey, guess what?

I'm impotent. I can't understand your environment because you don't have any network documentation. I know you're paying me a lot of money, but they're not gonna come and tell you this. It's just human nature. But without good documentation and systemization, your people will take years to assimilate and understand a complex architecture instead of weeks if you [00:50:30] have a diagram.

So your system diagram should, , show your people how everything works and the various dependencies so that when they come to work for you, it takes two or three weeks, some good training on your documentation and on your architecture, and then they can understand it. And your hu let's say you have a hundred plus people working on security and network and that sort of thing.

If two or three of 'em understand this cuz they've been there for [00:51:00] forever they're inundated and they can't support and help everyone understand every problem. They become the bottleneck. So by documenting your infrastructure, you basically do away with that bottleneck. And everybody is enabled by this diagram.

Yes, it's costly and yes, it takes a lot of work, very focused work to keep it up to date, but you will be glad you did. When you ha bring in a new CIO, bring in a new CISO bring, , [00:51:30] but there's a disaster. You are going to thank your lucky stars that you had the thought ahead of time of documenting your systems.

It's key. And you don't wanna necessarily outsource this because it's outsourcing your architecture. And I created a term called architecture ownership. Every company has a different architecture. They need to own it, and they need to understand it, and they need to make sure that it is [00:52:00] documented for the future.

This is just a very simple flow diagram that I created while we were documenting large Fortune 500, , networks. We started out at Burlington Northern Railroad building these beautiful diagrams of their train network and of their office automation networks. Did a really, , awesome job on that. I learned a lot from those engineers.

And then we took those technologies in a service we call docu net and basically can go in, in a matter of [00:52:30] weeks, reverse engineer and environment and build these beautiful, , documentation systems. I don't do that anymore. I don't do it for you necessarily, but I do provide the leadership. The wherewithal, the how to and help you build a system and build in these best practices.

So contact me if you're interested in some help on that. here's a different view of troubleshooting a big routed network. And when problems happen and it brings [00:53:00] down an entire energy company, a multi-billion dollar energy company went down because of some problems with their routing that they couldn't diagnose until we started diagramming out and seeing where the problem was.

We had two different environments of EIGRP 10 in two different areas, and they were competing with one another, but you couldn't see it because it wasn't diagrammed. Okay? this is also very cool. I know it's.[00:53:30] A lot of stuff, but two switches there up in the top that says trunk on the top, and then the VLANs, and they're color coded.

And then down at the bottom FW one, FW two. Those are firewalls. People buy two of everything today for redundancy. Here's the fallacy. When you diagram and show the dependencies for each critical transaction, and that, for instance, the yellow transactions goes. Three fire, three devices.

You can pull that [00:54:00] firewall or you can pull that switch or you can, it can have a disaster and break and your entire system goes down, even though you bought two of everything. You have to diagram out and figure out if a single point of failure will take those systems down and you buy two of everything.

You want it to be redundant and resilient. The problem is, if you don't take a look at and. Where your transactions, , go through and what they're dependent upon. You don't see that [00:54:30] you've configured the capability that requires all four of those devices, even though they're redundant to be up and running for your transaction to complete properly.

And then when that something like that happens, you're wondering why isn't my redundancy working? Exactly. We've been called in to troubleshoot a number of huge organizations that pulled the plug on some things and then after they tried to reconnect [00:55:00] it, every time they reconnect their, , redundant devices, the network would break again and it would cause a big meltdown.

And they didn't want to do that. So they end up running with a single point of failure and not using the redundant technology. So they bought two of very expensive network components and systems. But they couldn't connect them together to, to build a reliable, , resilient network in the case of a problem with one device.

It just was a single point of failure. [00:55:30] All right, so that kind of helps you get down. This is a very complex, , configuration set of configuration variables about transparent bridging and that sort of thing. And I'm not trying to teach you those details because they're irrelevant to 90% of you.

However, you do need to realize that certain key things. Will break. And when they break, if you don't know how they're configured and you can't visualize the environment and [00:56:00] how they're configured, there's no way you're gonna be able to recover gracefully. And it's just gonna continue to be a kerfuffle and you're just gonna keep having great problems until you really get it nailed down.

So that's what that is about. Now, when there is a disastrous problem caused by a natural disaster caused by some other thing you take a look at and you see I have this disastrous problem and the status quo your view is [00:56:30] what I call a square. And so you can see that I have a square up there.

It's got your team, your environment, your problem, your symptoms, and all these different things are out there that we know about the problem. We know things aren't working well, but you can see that it's just basically two dimensional, it's a square. In order to overcome a disastrous problem, typically to, it requires what I call a paradigm shift.

You cannot solve [00:57:00] today's problem with today's information. You need some new input. You need something that tells you here's where the problem is. And I call that moving from a square, two dimensional to a three dimensional cube. And I'm gonna show you a picture of this. Boom. 1, 2, 3. you can see that with a new input.

Six-sided cube allows you two more, , viewpoints and that allows [00:57:30] you to have new information, whether that's, , another technologist coming in to help you with a key information. A lot of times I'm a deep packet inspection guy, and when I come in I add another input and another perspective to the, to view so that you can get a different payoff.

In some cases it may be, a different cloud person or application person who comes in and sees a new finding, a new visibility, a new diagram, a new metric, a new root [00:58:00] cause, and that allows you to shift the paradigm and you can solve today's problems. With new information and then it's always simple, right?

Why didn't I think of that? Yeah, exactly. Problems require new thinking, new information, and the pressure of a disaster is exactly when you can harness those capabilities of your team. It's really awesome [00:58:30] paradigm shift. Now, the whole purpose is to build business continuity so that we can have resilient systems ongoing operations.

If we do have a problem, we can recover more, , rapidly, and we have good systems of communication with management, planning, et cetera. So we're harnessing all the best practices to maintain business continuity, and it's all part of the system now. I at this [00:59:00] point, I wanna just, , introduce you to a fabulous leader, a technologist and leader.

And this is, , Colonel David Wills. He's gonna talk to you just for a moment about a little leadership principle, and then you can go through and listen to his longer talk about technology and managing the entire war, , in Afghanistan and Iraq and building out large networks and diagramming them and that sort of thing.

, in both central command, , at the JOINT [00:59:30] CHIEFS and at Strategic Command, you will not find a more experienced, knowledgeable fellow. Then, , , this leader, Colonel David Wills.

I'm currently employed by General Dynamics Information Technology. Retired not even a full year ago. And you all say that's mildly interesting. Why are you our keynote speaker, Bill Alderson? I'm still trying to figure that one out, but I think it has to deal with the fact that, Vint Cerf created TCP/IP Internet which spawned the DDN, which is now the DISN, I spent the last 20 years making change [01:00:00] on that network and infrastructure. In my current position , I get to continue making change in leading change from a technology perspective. As I talked about a couple words that weren't on the slide.

Trust is what sticks out in my mind. At the end of the day, trust is what leadership boils down to.

We're back. So here we have, Cybersecurity. Cybersecurity is truly under disaster recovery because when a Cybersecurity event [01:00:30] hits, the disaster recovery team has to go to work in order to figure out what's going on. And so the. The disaster recovery field and professionals in disaster recovery are key to helping us make certain that we can manage through and navigate Cybersecurity incidents well.

And it's a similar sort of process to what disaster recovery people have been doing [01:01:00] for decades and bringing them in to help lead the Cybersecurity events and incidents is a really good thing to integrate those team two teams together to build redundancy and reliability. Again, you've seen this before.

I'm gonna reiterate, take the best practices the. , high fidelity, low noise and then amplify that into your team impute and apply those best practices in advance. [01:01:30] And it's always more fun to be ready. So figure out your lessons learned or you learn from other people's lessons learned.

Get those things imputed. Get your network documentation, get your alerting systems, and put all those things together and then build out some tangible results. And like I said, even Mark Zuckerberg, who can hire the , smartest people in the room all the time, his network went down a year ago, October 4th, 2021, [01:02:00] and it lost him 25 to 50 billion in value in a matter of hours.

Why is that? There's some best practices that he and his team did not put into their system, and it allowed a very problematic situation costing his organization billions of dollars. So I'm gonna, I'm gonna talk to you about that in future analysis of a problem with facebook.com going down.

And then we'll [01:02:30] discover those sort of things. But I just need you to know that this is more relevant now that we are more dependent upon data and it doesn't matter what kind of disaster we have. Natural disaster. Or what it always involves today, data, because our environment is so dependent upon data.

?Here is some additional things that we're gonna, , talk about in the future. , biometric systems and federal government. Matter of fact, it sounds like I've [01:03:00] done more with military than anything else, and that's not true. Actually, most of my work is with Fortune 100, fortune 500, , energy, , companies, financial, , healthcare organizations, various types of data disasters and Cybersecurity events that required, , some experience to make certain that you're ready.

Because like I always say, it's a lot more fun to be ready. I'd love to tell your story. [01:03:30] So if you have a story of, , as a planner, as a implementer, as a responder to some type of disaster, my job today is to bring your stories in. , Pick out all the lessons learned and the best practices so that other people can benefit from these things.

We will be out there serving you and helping you solve those problems. And we're always happy to be a friend to you when you are in [01:04:00] need. Whether you need to review your architecture, make sure you're ready, we can take a look at that. And if you have a disaster, you can, , click on our website and boom, go in and say, I have a disaster and I need some help.

Whatever that is. We're happy to help you and we really enjoy teaching, training and helping you impute the best practices that will save you time and money and possibly, obviously lives, , when those things [01:04:30] are at stake. Thank you so much for, , being with me today. Look forward to seeing you in our next broadcast.

Monikaben Lala

Chief Marketing Officer | Product MVP Expert | Cyber Security Enthusiast | @ GITEX DUBAI in October

1 年

Bill, thanks for sharing!

回复
Brandy Gordon MS, PhD(c), MCFE, CSO

CSO||Certified Digital Forensic Examiner|Doctoral Researcher|Founder|Keynote Speaker|DFIR Investigator??Malware/Reverse Engineer|Expert Witness|Assisting in IT Audits, Litigations and Breach Recovery?Let's Talk Forensics

1 年

You guys were ready for the "Call" why? because you documented past experiences and had a place to start... you didn't go in with your hands empty... you guys didn't need a software application to automate things for you... you didn't need to search the internet ... you didn't need to look for resources because you were the #source !! ?? ?? ?? #cyberrecovery #cyberdefenders #cybersecurity #cyberintelligence Can you tell that I love this stuff! ??

要查看或添加评论,请登录

社区洞察

其他会员也浏览了