So you think the Cloud is reliable ... think again!

So you think the Cloud is reliable ... think again!

Introduction

So many companies now depend on Cloud-based applications, such for email and Cloud storage, and we all think that they are fairly robust and give us 100% availability. Well think again. Around 2pm on Sunday (UK time), Dropbox decided that it would give up synchonising files, and left a fairly general message of:

Dropbox leaves the stage of a bit

The first panic sets in that you've lost your files, or that you've done something wrong, but then you find it is a common problem around the World:

You next start to wonder how long it's going to be down, as it's right in the middle of downloading that important file, or that you have shared a file across a company, and none of the users are getting updates. While, the time of the outage, a ping is working okay, there is was Dropbox service, and it struggled to synchronise files and deliver them through the Web browser.

It is well known in the industry that weekends are often a problem with outages, and the message facing users was:

We are aware of an issue currently affecting the Dropbox site. We have identified the cause, which was the result of an issue that arose during routine internal maintenance, and are working to fix this as soon as possible.

Not so robust

Recently one of Google's data centres in Belgium  (europe-west1-b) was affected by a power loss caused when there were four lightning strikes on the local power grid. While most of the servers had battery backup and redundant storage, Google still ended up losing an estimate of 0.000001% of its disk space  (which is around ten kilobytes for every 1 terabyte - giving a possible loss of a few gigabytes of data).

The key areas of loss are likely to be around cached data or where data was being written when there was a power glitch. So the lesson is ... don't trust your Cloud provider to store your data, and you need to backup critical data locally.

This case actually shows how dependent we are on power supplies, and these must be protected in order to cope with a disaster situation. Also, people think that lightning never strikes twice, but that's not the case with electrical supplies as they are long runs of a metal conductor, and when the isolators are wet they can provide an easy route to earth. Companies must have electrical supply backups of batteries and local generators:

So what's the critical things? ... your power supplies and your air conditioning systems.

First be safe

Recently, in the US, the former Secretary of Defense William Cohen outlined that the US power grid was at great risk of large-scale outage, especially in the face of a terrorist attack:

The possibility of a terrorist attack on the nation's power grid — an assault that would cause coast-to-coast chaos — is a very real one.

As I used to be an electrical engineer,  I understand the need for a safe and robust supply of electrical power and that control systems can fail. Often, too, we would run alternative supplies to important pieces of equipment, in case one of the supplies failed. So if there's a single point-of-failure on any infrastructure that will cause large scale problems, it is the humble electrical power supply.

With control systems, there are often three main objectives (or regions of operation):

  1. Make it safe (protect life and equipment)!
  2. Make it legal (comply with regulations)!
  3. Make it work and make it optimized (save money and time)!

So basically the first rule trumped the other ones, so that a system would shut down if it there was a danger to life. Next the objective would be to make it legal, so that it fitted with regulatory requirements (such as for emissions, noise or energy requirements). Finally the least important was to make it optimized, but it the control system moved it back into the first two regions, then the control system would focus on making it safe and legal.

Failover over failover over failover

The electrical supply is one of the key elements that will cause massive disruption to IT infrastructures, so the supply grid will try to provide alternative routes for the power when there is an outage on any part of it. In Figure 1, we can see that any one of the power supplies can fail, or any one of the transmission lines, or any one of the substations, and there will still be a route for the power. The infrastructure must thus be safe, so there are detectors which detect when the system is overloaded, and will automatically switch off the transmission of the power when it reaches an overload situation. For example, a circuit breaker in your home can detect when there is too much current being drawn and will disconnect the power before any damage is done. The "mechanical" device - the fuse is a secondary fail-safe, but if both fail, you'll have a melted wire to replace, as cables heat up as they pass more current (power is current squared times resistance). If the cable gets too hot it will melt.

In Figure 1, the overload detector will send information back to the central controller, and operators can normally make a judgement on whether a transmission route is going to fail, and make plans for other routes. If it happens too quickly, or if an alarm goes un-noticed, transmission routes can then fail, which can increase the requirements from the other routes, and which can cause them to fail, so the whole thing fails like a row of dominoes.

Figure 1: Electrical supplies

Large-scale power outage

So, in the US, the former Secretary of Defense William Cohen has sent a cold sweat down many leader's back, including industry leaders, as a major outage on the power grid, would cause large-scale economic and social damage. At the core is the limited ability to run for short periods of time with UPS (uninterruptible power supply), and then on generators, in order to keep networked equipment and servers running, but a major outage would affect the core infrastructure, which often does not have the robustness of corporate systems. His feelings is that an outage on the grid would cause chaos and civil unrest throughout the country. 

Alarm bells have been ringing of a while with Janet Napolitano, former Department of Homeland Security Secretary, outlined that a cyber attack on the power grid focused on “when,” not “if.” and where Dr. Peter Vincent Pry (Former senior CIA analyst defining that the US was unprepared for an attack on its electrical supply network and that it could:

take the lives of every nine out of ten Americans in the process.

The damage that a devastating EMP (Electromagnetic Pulse), such as from a nuclear explosion, has been well known, but many now think it is the complex nature of the interconnected components of the network and their control system infrastructure (typically known as SCADA - supervisory control and data acquisition) could be the major risk.

Perhaps a pointer to the problems that an outage cause is the Northeast blackout on 14 August 2003, which affect 10 million people in Ontario and 45 million people in eight US states.  It was caused by a software bug in an alarm system in a control room in Ohio. With this, some foliage touched one of the supply lines, which caused an overload of them. The bug stopped the alarm from being displayed in the control, and where the operators would have re-distributed the power from other supplies. In the end the power systems overloaded and started to trip, and caused a domino effect for the rest of the connected network. Overall it took two days to restore all of the power to consumers.

As the world becomes increasingly dependent on the Internet, we have created robustness in the ways that the devices connect to each other, and the multiple routes that packets can take. But basically no electrical power will often disable the core routing functionality.

Control systems - the weakest link

As we move into an Information Age we becoming increasing dependent on data for the control of our infrastructures, which leaves them open to attackers. Often critical infrastructure is obvious, such as the energy supplies for data centers, but it is often the ones which are the least obvious that are the most open to attack. This could be for an air conditioning system in a data centre, where a failure can cause the equipment to virtually melt (especially tape drives) or in the control of traffic around a city. As we move towards using data to control and optimize our lives we become more dependence on it.

Normally in safety critical systems there is a failsafe control mechanism, which is an out-of-band control system which makes sure that the system does not operate outside its safe working. In a control plant, this might be a vibration sensor on a pump, where, if it is run too fast, it will be detected, and the control system will place the overall system into a safe mode. For traffic lights, there is normally a vision capture of the state of the lights, and this is fed back to a failsafe system, that is able to detect when the lights are incorrect. if someone gets access to the failsafe system, the can thus overrule safety, and compromise the system. This article outlines a case where this occurred, and some of the lessons that can be learnt from it.

Traffic Light Hacking

Security researchers, lead by Alex Halderman at the University of Michigan, managed to use a laptop and an off-the-shelf radio transmitter to control traffic light signals (https://jhalderm.com/pub/papers/traffic-woot14.pdf). Overall they found many security vulnerabilities and managed to control over 100 traffic signals within Michigan City using a single laptop. In order to be ethical in their approach the gained full permission form the road agency, and made sure that there was no danger to drivers. Their sole motivation was to show that traffic control infrastructure could be easily taken over.

Overall they found a weak implementation of security with the usage of open and unencrypted radio signals, which allowed intruders to tap into their communications, and then discovered the usage of factory-default usernames and passwords. Along with this there was a debugging port which could be easily compromised.

In the US, the radio frequency used to control traffic lights is typically in the ISM band at 900 MHz or 5.8 GHz, which makes it fairly easy to get equipment to communicate with the radio system. The researchers used readily available wireless equipment and single laptop to read the unencrypted data on the wireless network.

Figure 2 provides an overview of the control system where the radio transmitter provides a live feed (and other sensed information) to the road agency. The induction unit is normally buried in each of the junctions, and detect cars as the pass over it, and the camera is used to watch the traffic lights, and feed the colours of the lights back to the controller. In this way there is a visual failsafe.


Figure 2: Overview of traffic control system

Overriding the failsafe

The MMU (Malfunction Management Unit) is the failsafe operator on the system and ensures that the lights are not put into an unsafe state (such as for Red and Green at the same time), and the lights are then adjusted using the information gained from the induction loops in the road (and which senses cars as they pass over it). If control can be gained to the MMU, and allow for access to the controller, the lights can be compromised to go into incorrect states, or to stay at steady red (and cause a grid lock within a city). Within the MMU controller board, the researchers found that by connecting a jumper wire, the output from the controller was ignored, and the intersection put into a known-safe state.

Same old debug port

A typical security problem in many control systems is that there is often a debug port, which gives highly privileged access to the system. Within this compromise, the researchers found that the control boxes ran VxWorks 5.5, which leaves a debug port open for testing. They then sniffed the packages between the controller and the MMU, and found that there was no authentication used, and that the messages were not encrypted and can be easily viewed and replayed. This allowed them to reverse engineer the messaging protocol for the lights. They then created a program to activate any of the buttons within the controller and display the results, and then even to access the controller remotely. In the end they managed to turn all the lights in the neighborhood to red (or all green on a given route – in order to operate safely within the experiment).

DDoS

Finally they found that the units were susceptible to a denial-of-service (DoS) attack, where continual accesses with incorrect control signals over the network, caused the malfunction management unit to put the lights in a failure state (all red). In this way the system failed to cope with excessive traffic, and all the units would end up failing with this type of probe.

This vulnerability showed all the standard signs of the bad integration of security, and which is common in many systems, where security is not thought of as a major concern. This is not a small scale issue, as the researchers identified that this type of system is used in more than 60% of the traffic intersections in the US. If a malicious agent wanted to bring a city, or even a country to its knees, they could just flip a switch … and there is no road transport system, which can then cause chaos to the rest of the infrastructure. We really need to think the way that systems are designed, and probe them for their vulnerabilities.

The researchers in this study have already got other easy targets in their sight such as tapping into the public messaging systems on freeways, and into the infrastructure created by the U.S. Department of Transportation (USDOT) for vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) systems, along with the new work related to the Connected Vehicle Safety Pilot program.

Conclusions

With all our complex infrastructures, it is the most simple of things that can trip them all, and cause large-scale chaos ... the electrical supply. Unfortunately it's not an easy call to make as the systems need to be safe, but this safety can lead to automated trips and are in danger from operator error.

As we move into a world, too, where the intercommunication of signals between cars and the roadway, and between cars, it is important that we understand if there are security problems, as with flick of a switch an attacker could cause mass chaos.

So our perhaps are security risks are not from the servers and desktops and mobile devices, but from the new Internet of Things (IoT) and from power supplies, so make sure your own power supplies are secure for your organisation, and just hope that someone somewhere is doing the same for your supplies.

 

 

Wayne Anderson

Microsoft Cybersecurity and Innovation Strategy Leader at BDO Digital, CISM, GSLC, C|CISO

9 年

Ironically we find ourselves - as an industry - back to the future. The concept of brittle computing must be considered when working with cloud services which are extremely dynamic in nature. High availability often requires that the economic model for the cloud platform accommodate redundancy of resources. The critical needs as outlined by Todd Glassey in the discussion above, would potentially drive a secondary consideration of whether the organization needs to have basic capability established against the failure at the cloud provider level. As an industry we have seen in past years entire providers go down, the criticality consideration of the application should drive a Business Impact Analysis: what is the business impact of impaired or eliminated function? Is there a cost/value to reputation, brand, and actual sales if the ".com" site hosted in the cloud were to suddenly be unavailable? One of the interesting things about cloud is that a "warm" standby site can be spun up quickly with rapid "inflation" of standby resources, in the case of a serious failure. As long as the organization has a cloud-ready management model that enables things to be properly deflated when not being relied upon!

回复

A similar issue happened to me with Box. Box encrypts the file you send on their site. The problem I had was quite weird and random. Some pictures, and always pictures would be wrongly encrypted and they would become unreadable. Fortunately I was only doing tests. As part of my test I developed a small test app to encrypt locally my stuff and send them to their cloud. My PC client app would then decipher the data again when used locally. In that case I never had any issue. At first I thought there might be something wrong on my browser when I was using Box but one day I talked to my cousin in France, and he told me he had the same issue. The fact is that if you really want to be in control of the privacy of your data, you'll have to encrypt them on your local machine and transfer them encrypted to your cloud. Only that way you can be sure that your data are totally yours and private. If you use a reliable block algorithm (AES256, Blowfish, Twofish etc...) as the keys are yours and you control the origin of the algorithm the risk that your data get stolen either during transfer and on the storage is very very low.

回复
Karim Wittmann

Private account - #gerneperDuOderSie

9 年

I'd say on the most basic level the solutions are available to make it as safe as humanly possible, based on my previous experience, running a network under less then perfect circumstances with security a real concern already). It is rather a question off effort and investment and scaling up in time tp the needs- a question off how many UPS/ batteries, dual or triple power feats, full power and module redundancy, how many back generators you deploy; how many alternative routes you have (fiber; microwave; satellite); how much geographical redundancy; how often do you take a off network backup; how strict is your site access/ on site security, avoid single point of failures in your back-haul, how much overlap you have in your cell coverage, etc. I'd say the bigger threat should be online "attackers" and malware ... and maybe the accidental SW bug, and human error while operating and upgrading the systems ...

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了