Tripping the System..Controllers and Power Supplies..the Major Risks!

Tripping the System..Controllers and Power Supplies..the Major Risks!

First be safe

This week, in the US, the former Secretary of Defense William Cohen outlined that the US power grid was at great risk of large-scale outage, especially in the face of a terrorist attack:

The possibility of a terrorist attack on the nation's power grid — an assault that would cause coast-to-coast chaos — is a very real one.

As I used to be an electrical engineer,  I understand the need for a safe and robust supply of electrical power and that control systems can fail. Often, too, we would run alternative supplies to important pieces of equipment, in case one of the supplies failed. So if there's a single point-of-failure on any infrastructure that will cause large scale problems, it is the humble electrical power supply.

With control systems, there are often three main objectives (or regions of operation):

  1. Make it safe (protect life and equipment)!
  2. Make it legal (comply with regulations)!
  3. Make it work and make it optimized (save money and time)!

So basically the first rule trumped the other ones, so that a system would shut down if it there was a danger to life. Next the objective would be to make it legal, so that it fitted with regulatory requirements (such as for emissions, noise or energy requirements). Finally the least important was to make it optimized, but it the control system moved it back into the first two regions, then the control system would focus on making it safe and legal.

Failover over failover over failover

The electrical supply is one of the key elements that will cause massive disruption to IT infrastructures, so the supply grid will try to provide alternative routes for the power when there is an outage on any part of it. In Figure 1, we can see that any one of the power supplies can fail, or any one of the transmission lines, or any one of the substations, and there will still be a route for the power. The infrastructure must thus be safe, so there are detectors which detect when the system is overloaded, and will automatically switch off the transmission of the power when it reaches an overload situation. For example, a circuit breaker in your home can detect when there is too much current being drawn and will disconnect the power before any damage is done. The "mechanical" device - the fuse is a secondary fail-safe, but if both fail, you'll have a melted wire to replace, as cables heat up as they pass more current (power is current squared times resistance). If the cable gets too hot it will melt.

In Figure 1, the overload detector will send information back to the central controller, and operators can normally make a judgement on whether a transmission route is going to fail, and make plans for other routes. If it happens too quickly, or if an alarm goes nu-noticed, transmission routes can then fail, which can increase the requirements from the other routes, and which can cause them to fail, so the whole thing fails like a row of dominoes.

Figure 1: Electrical supplies

Large-scale power outage

So, in the US, the former Secretary of Defense William Cohen has sent a cold sweat down many leader's back, including industry leaders, as a major outage on the power grid, would cause large-scale economic and social damage. At the core is the limited ability to run for short periods of time with UPS (uninterruptible power supply), and then on generators, in order to keep networked equipment and servers running, but a major outage would affect the core infrastructure, which often does not have the robustness of corporate systems. His feelings is that an outage on the grid would cause chaos and civil unrest throughout the country. 

Alarm bells have been ringing of a while with Janet Napolitano, former Department of Homeland Security Secretary, outlined that a cyber attack on the power grid focused on “when,” not “if.” and where Dr. Peter Vincent Pry (Former senior CIA analyst defining that the US was unprepared for an attack on its electrical supply network and that it could:

take the lives of every nine out of ten Americans in the process.

The damage that a devastating EMP (Electromagnetic Pulse), such as from a nuclear explosion, has been well known, but many now think it is the complex nature of the interconnected components of the network and their control system infrastructure (typically known as SCADA - supervisory control and data acquisition) could be the major risk.

Perhaps a pointer to the problems that an outage cause is the Northeast blackout on 14 August 2003, which affect 10 million people in Ontario and 45 million people in eight US states.  It was caused by a software bug in an alarm system in a control room in Ohio. With this, some foliage touched one of the supply lines, which caused an overload of them. The bug stopped the alarm from being displayed in the control, and where the operators would have re-distributed the power from other supplies. In the end the power systems overloaded and started to trip, and caused a domino effect for the rest of the connected network. Overall it took two days to restore all of the power to consumers.

As the world becomes increasingly dependent on the Internet, we have created robustness in the ways that the devices connect to each other, and the multiple routes that packets can take. But basically no electrical power will often disable the core routing functionality.

Control systems - the weakest link

As we move into an Information Age we becoming increasing dependent on data for the control of our infrastructures, which leaves them open to attackers. Often critical infrastructure is obvious, such as the energy supplies for data centers, but it is often the ones which are the least obvious that are the most open to attack. This could be for an air conditioning system in a data centre, where a failure can cause the equipment to virtually melt (especially tape drives) or in the control of traffic around a city. As we move towards using data to control and optimize our lives we become more dependence on it.

Normally in safety critical systems there is a failsafe control mechanism, which is an out-of-band control system which makes sure that the system does not operate outside its safe working. In a control plant, this might be a vibration sensor on a pump, where, if it is run too fast, it will be detected, and the control system will place the overall system into a safe mode. For traffic lights, there is normally a vision capture of the state of the lights, and this is fed back to a failsafe system, that is able to detect when the lights are incorrect. if someone gets access to the failsafe system, the can thus overrule safety, and compromise the system. This article outlines a case where this occurred, and some of the lessons that can be learnt from it.

Traffic Light Hacking

Security researchers, lead by Alex Halderman at the University of Michigan, managed to use a laptop and an off-the-shelf radio transmitter to control traffic light signals (https://jhalderm.com/pub/papers/traffic-woot14.pdf). Overall they found many security vulnerabilities and managed to control over 100 traffic signals within Michigan City using a single laptop. In order to be ethical in their approach the gained full permission form the road agency, and made sure that there was no danger to drivers. Their sole motivation was to show that traffic control infrastructure could be easily taken over.

Overall they found a weak implementation of security with the usage of open and unencrypted radio signals, which allowed intruders to tap into their communications, and then discovered the usage of factory-default usernames and passwords. Along with this there was a debugging port which could be easily compromised.

In the US, the radio frequency used to control traffic lights is typically in the ISM band at 900 MHz or 5.8 GHz, which makes it fairly easy to get equipment to communicate with the radio system. The researchers used readily available wireless equipment and single laptop to read the unencrypted data on the wireless network.

Figure 2 provides an overview of the control system where the radio transmitter provides a live feed (and other sensed information) to the road agency. The induction unit is normally buried in each of the junctions, and detect cars as the pass over it, and the camera is used to watch the traffic lights, and feed the colours of the lights back to the controller. In this way there is a visual failsafe.

Figure 2: Overview of traffic control system

Overriding the failsafe

The MMU (Malfunction Management Unit) is the failsafe operator on the system and ensures that the lights are not put into an unsafe state (such as for Red and Green at the same time), and the lights are then adjusted using the information gained from the induction loops in the road (and which senses cars as they pass over it). If control can be gained to the MMU, and allow for access to the controller, the lights can be compromised to go into incorrect states, or to stay at steady red (and cause a grid lock within a city). Within the MMU controller board, the researchers found that by connecting a jumper wire, the output from the controller was ignored, and the intersection put into a known-safe state.

Same old debug port

A typical security problem in many control systems is that there is often a debug port, which gives highly privileged access to the system. Within this compromise, the researchers found that the control boxes ran VxWorks 5.5, which leaves a debug port open for testing. They then sniffed the packages between the controller and the MMU, and found that there was no authentication used, and that the messages were not encrypted and can be easily viewed and replayed. This allowed them to reverse engineer the messaging protocol for the lights. They then created a program to activate any of the buttons within the controller and display the results, and then even to access the controller remotely. In the end they managed to turn all the lights in the neighborhood to red (or all green on a given route – in order to operate safely within the experiment).

DDoS

Finally they found that the units were susceptible to a denial-of-service (DoS) attack, where continual accesses with incorrect control signals over the network, caused the malfunction management unit to put the lights in a failure state (all red). In this way the system failed to cope with excessive traffic, and all the units would end up failing with this type of probe.

This vulnerability showed all the standard signs of the bad integration of security, and which is common in many systems, where security is not thought of as a major concern. This is not a small scale issue, as the researchers identified that this type of system is used in more than 60% of the traffic intersections in the US. If a malicious agent wanted to bring a city, or even a country to its knees, they could just flip a switch … and there is no road transport system, which can then cause chaos to the rest of the infrastructure. We really need to think the way that systems are designed, and probe them for their vulnerabilities.

The researchers in this study have already got other easy targets in their sight such as tapping into the public messaging systems on freeways, and into the infrastructure created by the U.S. Department of Transportation (USDOT) for vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) systems, along with the new work related to the Connected Vehicle Safety Pilot program.

Conclusions

With all our complex infrastructures, it is the most simple of things that can trip them all, and cause large-scale chaos ... the electrical supply. Unfortunately it's not an easy call to make as the systems need to be safe, but this safety can lead to automated trips and are in danger from operator error.

As we move into a world, too, where the intercommunication of signals between cars and the roadway, and between cars, it is important that we understand if there are security problems, as with flick of a switch an attacker could cause mass chaos.

So our perhaps are security risks are not from the servers and desktops and mobile devices, but from the new Internet of Things (IoT) and from power supplies, so make sure your own power supplies are secure for your organisation, and just hope that someone somewhere is doing the same for your supplies.

Postscript

Personally, I've purchased and installed lots of computer equipment, and help build our Cloud, and the best investment I ever made was to purchase a UPS,  as I spent so long rebuilding the disk infrastructure when the power failed. Once I installed it, the system could cope with short outages, and could even fallback gracefully. Many racks too have dual power supplies for their equipment fed by different phases (as we normally use three-phase supplies) or from differently sourced power supplies.

If you're interested, here's our Cloud from last year - it's a lot larger now and will soon have lots of new equipment -, but pride of place goes to the UPS:

If you a Director of a company, you should ask yourself three things:

  • Are my internal systems protected against a power failure (including the phone network)?
  • What measures do we have for a large-scale failure of external supplies?
  • Can I rebuild my organisation in a different place, and how long will it take?

The last question is interesting, especially with the increasing reliance on the Cloud. If there was a devastating event that happened in a particular area, could I replace my company elsewhere, and how long will it take. If the answer to this is longer than our customers will take, then you have a problem. If you're a government official, hopefully you have answers to all these things. It must be remember that the phone network is also powered by the Internet, so much of the communications could go too.

 

 

 

 

要查看或添加评论,请登录

Prof Bill Buchanan OBE FRSE的更多文章

社区洞察