Grabbing OT benefits in the cloud - how to align security targets
AI generated image - robotic arm connected to the cloud

Grabbing OT benefits in the cloud - how to align security targets

Modern OT projects are no longer just OT projects. Vendors are pushing their own SaaS models with cloud components, and asset owners want their OT data where their analytics capabilities live - and today that is mostly in the cloud. The benefits are better analytics, better data availability, and the ability to integrate external data sources into decision making that eventually leads to some action on the factory floor. Here are some examples of where we have seen this type of cloud integration lately:

  • Maritime cranes streaming data to data analytics platform hosted in a public cloud, used for maintenance optimization, system support and regulatory compliance.
  • Renewable power plants connecting production systems to cloud analytics to improve production planning based on market and environmental data from other sources.
  • Remote operations of drilling rigs, using public cloud to make HMI's available to operators.

In other words, this is happening now, it is not the future. What is the security impact of this "OT cloud migration"? It is definitely two different engineering cultures meeting, causing collaboration challenges as the world of IEC 62443, the Purdue model and "island mode" as the ultimate security response is very different from the thinking in cloud security with zero trust, identity at the center of security, and a "cattle not pets" approach to asset management. Some key questions that typically surface in discussions about OT/cloud integrations are:

  1. How do I extend my zones and conduit model to the cloud?
  2. How do I apply IEC 62443 and security levels to cloud systems?
  3. How do I make sure that only changes that have been through our MOC process are applied in the system?

The discussions are often challenging, because "automation culture" and "cloud culture" are often like two different worlds. Let's take AWS as an example of a public cloud environment where we would like to build our OT integrated systems. The key architectural guidance in AWS for how to build secure services is the AWS well-architected framework. This framework has six "pillars":

  • Operational excellence: automating changes, responding to events, defining standards for daily operations.
  • Security: protecting the confidentiality, integrity and availability of services and data. This has similar goals as security in OT but quite different implementation mechanisms due to much more granular controls and richer data.
  • Reliability: how to quickly recover from failure, distributed workload management and automated recovery solutions
  • Performance: selecting resource types, monitoring performance, adjusting to maintain efficiency over time. Much more dynamic workload management than the OT world is used to.
  • Cost optimization: avoid unnecessary service costs by using scaling and elasticity. This is typically static in OT.
  • Sustainability: minimizing environmental impact of cloud computing resource consumption.

The well-architected framework is much less process heavy than IEC 62443. In the OT world systems are typically treated as much more static, with manual change processes, and the network is the primary security perimeter. This leads to two challenges when the cloud engineers are going to collaborate with the OT engineers:

  • If OT engineers are extending their security and operations models to the cloud, you will not get the benefits the cloud can give and you will underutilize security capabilities in the cloud, and use more resources than necessary because you don't take advantage of the dynamic nature and elasticity of cloud computing.
  • If cloud engineers are extending their security model to the OT world, the mechanisms they rely on to secure workloads simply don't exist. The concept of identity is typically weak, data for monitoring not by far as rich as in the cloud.

So what could be a reasonable approach ahead?

The OT cloudification compromise

The primary goal of OT security is to protect industrial processes from cyber attacks, to keep them running to deliver necessary services to customers and society, and above all, to avoid safety incidents. When integrating cloud services into the way we operate these systems, we should not allow degrading the trust we have in the safety and robustness of the OT system.

A key principle in OT security is that you need to separate IT and OT environments, and that it should be possible to isolate the OT system from the office IT network completely, and still operate it without any serious degradation. This principle needs to extend to the cloud. There are different ways of achieving this:

  • Completely separate AWS accounts, Azure tenants or similar concepts with other providers.
  • The same account is used to manage both enterprise and OT services, but other cloud mechanisms are used to provide segregation (such as VPC's, network security groups, IAM accounts, etc).

Microsoft has a discussion on this focusing on identity management (Azure AD at the time, now Entra ID): OT Cloud Enablement – Azure Active Directory Tenant | Microsoft Community Hub.

This brings us to the next common principle: use different AD environments for IT and OT environments. The purpose of this is to reduce the risk of lateral movement from the IT side to the OT side. This may also be challenged in a more modern setting, where there are robust identity and access control models that allow for just-in-time decisions and strong authentication mechanisms in the cloud. If the on-prem OT enviornment is older, it may not have support for modern identity based secruity, and in this case it is important not to build architectures that make the on-prem operation depend on an identity structure in the cloud, preventing "Island mode" operation.

To reduce the chaos in such decisions, we can establish 3 principles for OT extension to the cloud:

  1. No hard cloud dependencies: Don't build dependencies that can threaten uptime or safety of a service in abnormal events. When using modern concepts with weaker support in the cloud environment, ensure safe fallback in case of island mode or sudden connectivity failure.
  2. Security bottom-up from the factory floor: Build security bottom-up: start with zones and conduits, extend these to the cloud based on data flows, and apply security levels based on IEC 62443-3-2 and foundational requirements based on IEC 62443-3-3. These should extend to services in the cloud that directly influence operations.
  3. Improve security performance with cloud capabilities: Use cloud security practices to improve security performance of any service running in the cloud, but making sure that automated response and resource scaling does not cause unmanaged impact on OT operations.

A simple example: the cloud connected coffee machine

Consider you work in an office with a typical office coffee machine. This coffee machine is critical for production at the office. To reduce worker downtime due to coffee fetching, you have bought the Cloud AI Premium Coffee Subscription. The coffee machine then not only makes coffee locally, it uses cloud based AI functions to optimize performance, anticipate who is fetching coffee, and preloading coffee recipes to remove the most time consuming part of the "fetch coffee" operation: deciding which drink to get.

Hackers from competing firms have been known to attack such intelligence coffee schemes, to reduce the performance of the competitor's workers. Due to this very dangerous situation, the coffee service provider has decided to improve the security of the setup using our 3 OT cloudification principles; (1) no hard dependencies, (2) bottom-up security, (3) cloud based security improvement.

The internals of the coffee machine are quite simple. There are regular PLC's controlling the coffee production process, an embedded Linux computer performing recipe management and providing a graphical user interface on a touch screen, an internal switch for the local network, connection to the office network for Internet connectivity. The system is also equipped with a web camera and software to perform facial recognition in addition to a pin code based authentication to allow users to save their preferences and favorite brews.

The embedded computer is connected to the cloud using a publicly available API with an API key for identification. In the cloud the coffee machine provider is running several services, such as facial recognition using AWS Rekognition as well las AWS Cognite for user management, and various other machine learning services to recommend users new drinks, provide health advice and so forth. The cloud environment uses a common database for all customers, and also runs services used for internal services in the same environment.

A sketch of the full system architecture prior to the security redesign is shown below.


Sketch of coffe machine architecture
A quick sketch of the Cloud AI Premium Coffee setup

The coffee company was worried about several scenarios, such as lateral movement from the coffee machine to the customer's internal network, sabotage of the coffee machine safety mechanisms.

First, the coffee machine would not make coffee it couldn't authenticate the user using facial recognition: in other words, local operation was tightly coupled with the cloud as the actual authentication happened in the AWS environment. Internet is down? Sorry, no coffee for you. The company decided to allow making coffee without a user account, and also to synchronize user accounts to the local Linux computer and allowing users to authenticate with a simple pin code on the touch display. Now the coffee machine can make coffee without Internet!

Next on the agenda is to build security from the bottom up. The company decides to upgrade the internal switch from a dumb switch to a managed one layer 3 switch supporting VLAN's and with built-in stateless firewall functionality, and wants to create the following security zones/VLAN's:

  1. Control: PLC's, sensors and actuators controlling the actual coffee production. Purdue Level 1.
  2. HMI: The embedded Linux computer and its touch display
  3. Safety: a new VLAN with a new PLC to allow safety operations to operate independently of the other PLC's.
  4. Management VLAN: a port reserved for connecting a service laptop to talk to all other VLAN's.

The company wants to provide remote automated maintenance via the API service. Ideally this should have a push/pull setup with a DMZ but the company decides to not implement this for cost reasons. Probably acceptable that each coffee machine does not come with its own DMZ!

Next the company performs a risk assessment, leading to security level allocation for the 4 security zones. The layer 3 switch is a common component, and is assigned the highest security level target of its connected zones. The verdict is:

  • Control zone: SL-T 1
  • HMI zone: SL-T 2
  • Safety zone: SL-T 1
  • Management zone: SL-T 3
  • Switch: SL-T 3

Based on this, the company has clear foundational requirements as described in IEC 62443-3-3. This applies to the on-board equipment in the coffee machine. An important step that has been taken here is to enable routing rules that only allow intended traffic. For example, it is no longer possible to send programming commands to the PLC's from the HMI computer.

The connectivity to the cloud without a DMZ will pass directly from the embedded Linux computer in the HMI zone to the cloud. The company decides to use mutual TLS to secure this connection, and in addition to that implement IAM user authentication with limited privileges for daily operations. For administrative actions the IAM user can assume a role with higher privileges in the cloud environment, to read new configurations and download software updates.

Next, the company decides to improve its cloud environment's security as well. It will from now on separate its internal services into another AWS account entirely. They decided to keep the multitenant setup but with stronger enforcement of identity based controls. In addition they have added better monitoring capabilities, logging data for each customer into separate S3 buckets, pulling data into monitoring environment and setting up alert rules to detect and respond to security incidents.

The engineers in the coffee company performs a risk assessment of the cloud environment and determines that the security level target for the cloud based services impacting coffee maker operations directly, as well as the conduit from the HMI Linux computer to the API gateway should be assigned SL-T 3.

They also decided to stream log data from the coffee maker to the cloud, allowing a unified approach to security monitoring.

What did we achieve? Coffee availability is now well protected from evil hackers!

  • We can operate the coffee machine in island mode - no direct cloud or Internet dependency
  • A compromise of the cloud environment has a lower risk of direct impact to coffee machines, depending on the nature of the compromise.
  • An attacker achieving control over the HMI computer is no longer able to tamper with PLC settings, and in particular the safety functions are better protected.

This example illustrates how the principles of no hard dependencies, bottom-up security, and cloud capability enhancement can provide a structured approach to combined security architecture for on-prem OT and cloud based enhancements.

Tommy Evensen

Chief Information Security Officer | OT Security Evangelist @ Omny

1 个月

This is fairly aligned with what is being done in domain (oil and gas). (Cloud for Analytics and fancy stuff) What I did find storage was that the safety zone only got SL-T of one? But that might be intential and based on the risk. I also like what you called routable changes to prevent programming. This is we are seeing a huge shift. Instead of being able to do this from. hMI we only allow it from engineer workstatiosn zones, and lately minimum one of the zones per system. All inn all informative article. Kudos. Lets see were we end up. Might be some time u til we get to move realtime control to the cloud. :)

Shan Keerthisinghe

Information Security Consultant | ICS/OT | MS Certified Cloud Security Architect and Solutions Architect | DFIR | APT Researcher | Red Team | Active Defense Enthusiast | Bridging Offensive and Defensive Security...???

1 个月

Very informative!

回复

要查看或添加评论,请登录

H?kon Olsen的更多文章

社区洞察

其他会员也浏览了