Hybrid Approaches to Safe ICS

Let's say you buy a Volvo, or one of many other reasonably modern cars. Id you do, there's a good chance it will come with some sort of collision avoidance/mitigation feature. This feature is simple: a low-resolution RADAR that's good enough to work out if something solid is rather close to your bonnet, connected to the brakes. It's not a sophisticated device, it doesn't prioritise passenger comfort, and it probably won't avoid all collisions, but it generally mitigates. It is simple, brutal, and effective.

Go to self-driving car startup like Waze (are they still a startup?) and you'll upgrade the sensor package to LIDAR; this gives you much better spatial resolution, giving the controller more confidence - both in avoiding collisions but also being confident enough to actually move forward. In return for this the sensors cost a lot and look silly.

The likes of Tesla and Uber have a different approach: you use Machine LearningTM (aka The Big Box of Statistics) to identify hazards on the road from a camera feed reasonably similar to your own eyes. This means you skip all that expensive LIDAR stuff, although if you were to make one bet on the progress of the physical sciences, it's that sensors get smaller and cheaper. It also, importantly, has a much better upper bound on performance compared to a RADAR sensor. If you're a good driver, you don't drive up to a hazard and slam on the brakes - you anticipate hazards and ease up gently, so gently that potentially passengers don't even notice. A Tesla could do that. Or it could fail to identify a concrete wall.

Other examples of this data-driven approach may be - if I may plug a team I want to work with - UKAEA using Support Vector Machines to identify plasma instabilities, or my own research on Adversarial Reinforcement Learning for the operation of a chemical plant.

For Uber, this approach went spectacularly badly, as they claimed the inevitable but uneviable distinction of being the first self-driving car company to kill a bystander, Elaine Herzberg. Prior to this, Uber had - for some reason - disabled the Volvo's RADAR system. Most disturbingly, and I should probably have lawyers here, the footage released by Uber does not appear to reflect real lighting levels at the crash site.

---

The argument for traditional safety systems is, well, an argument, made on paper. Probably literal paper at some point, although fault-tree analysis software is a good idea. But certainly made by white-collar workers, in a room where mediocre but free coffee is available. Generally, that argument goes like this:

* If $SYSTEM fails, $BAD_THING happens

* $SYSTEM fails if components A, B, C fail

* with probability P < 10^-8, A, B, and C will not all fail

* So $SYSTEM will not fail with P < 10^-8

* So $BAD_THING will almost certainly not happen

It's this sort of approach that leads to systems like Volvo's, since it's pretty simple to follow the argument. Major industrial accidents - Bianqao*, Bhopal, Chernobyl - are really rare. With the exception of the first, all the major accidents combined don't reach a single year of North American traffic fatalities**.

(* bet you've not heard of that one, have you?)

(** although environmental damage is not being counted.)

The argument for a data-driven approach is of course data-driven; we drive x miles with y fatalities, giving y/x fatalities per mile. Since human drivers have a high fatalities per mile, so long as ours is lower, we should adopt more self-driving. This has a strong issue with selection bias; people tend to let Teslas self-drive when they think it's safe, like on sunny highways. They're probably right, so you don't learn much about challenging scenarios from that data. It's also hard to gather empirical data on accidents if you already don't have many. Organisations like HSE (UK) and OSHA (US) emphasise counting the near misses and minor accidents to help with this, working on the basis of a pyramid where major accidents almost always have such near misses beforehand. Even these might be rare at a well-run plant.

To be honest, this is one of those situations where the approach you've probably thought of whilst reading this is actually the correct one: use all the Big Box of Statistics stuff to adjust the behaviour of those big, brutal, effective safety systems. This is demonstrated by Shi et al. with Neural Lander (from whom I've learnt the concept of Lipschitz continuity). Or by you, if you're a good driver with a RADAR-equipped car.

Perhaps another way to get empirical data from your Plant That's Not Failed Yet But if it Did That Would be Bad is to make a few virtual copies; then you can throw as much random aberrations as you want at it, as well as simulating (I've measured this on my, for want of a better word, "twin") 3 years' of simulated operation in approximately 300 s. "Monte Carlo" approaches also work here, but they are inevitably reliant on the programmer to model the right things, so if you're not **very** careful you're just getting out the assumptions you put in. It's almost certain that if your plant does fail, it's not because of a known P=10^-6 failure path. It's because nobody saw the P=10^-2 failure path (Boeing , GPU incorporated). If you're considering security, this sort of Rumsfeld-class error is even more likely to be responsible - if you knew of a security flaw, you'd probably have fixed it. That's why the Software Bill of Materials is such a good, overdue idea.

That's partly why I'm interested in (full disclosure: this is almost certainly insane) the idea of throwing Chaos Monkey style systems at chemical plant. If you're certain that $SYSTEM only fails when A, B and C fail, why not test it by disabling A? Doing so would most likely raise your baseline risk, to maybe 10^-6. But you are much more likely to catch the P=10^-2 failure mode that you didn't know about as a near miss.

Argue, test, track, and test some more.


要查看或添加评论,请登录

Martyn Smith的更多文章

社区洞察

其他会员也浏览了