A blueprint for OT SOC - Part 1

A blueprint for OT SOC - Part 1

Why do we need an OT SOC?


The term 5 Ws and H refers to the six basic questions to ask when gathering information or solving a problem. The questions are:

1. Who?

2. What?

3. Where?

4. When?

5. Why?

6. How?

The goal of this technique is to gain a factual answer to each question. Answers to all six questions should give clarity to whatever the questioner is trying to discover: the solution to a problem, the answer to a mystery, or even the best way to build a product.


In this series of articles, i will present the OT SOC blueprint by answering these questions

  1. Why do wee need an OT SOC?
  2. What are the components of the OT SOC and how to integrate them together?
  3. Where to build the OT SOC?
  4. Who are the stakeholders involved? and,
  5. When to build an OT SOC?

Statistical view on Cybersecurity Incident

There are many statistic about cybersecurity incident detection and response time, most of them puts the incident timing the ranges listed below. Of course it will be different from one facility to another in timing as well as impact and cost.

Of course many will argue that the defense is there and it can protect their high value assets. From my point of view this is a false sense of security, so let's break it down together in the below section

Prevention Atrophy

Prevention atrophy or decay in cybersecurity refers to the gradual reduction in the effectiveness of security measures over time.

  • Emerging threats and attack techniques: New and evolving cyber threats outpace existing defenses.
  • Aging technology and infrastructure: Older systems become vulnerable due to lack of vendor support.
  • Lack of regular updates and patching: Unpatched systems remain exposed to known vulnerabilities.
  • Complacency and human factors: Security awareness and practices degrade without regular reinforcement.
  • Changing business and operational environments: New assets or processes create security gaps.
  • Resource constraints: Limited budgets, staffing, or priorities weaken security efforts.
  • Undocumented changes: Unrecorded modifications to systems or configurations can introduce vulnerabilities and hinder incident response.

If an APT with advanced skills and vast resources, they will most probably bypass your defenses, either by crafting a target attack or through insider threat, your defense will not help in this case and you must be ready for detection and response.

Time to Detect(Dwell time)

Time to Detect (Dwell Time) is the duration between when a security breach or compromise begins and when it is first detected by the organization. It reflects how long an attacker can remain undetected within a system, potentially causing damage, exfiltrating data, or escalating privileges. Shortening dwell time is critical to minimizing the impact of security incidents.

With dwell time reaching 6 months in some incidents, . This extended dwell time allows the attackers to thoroughly explore the network, understand the environment, and strategically position themselves for the final, coordinated attack that led to widespread disrubtion.

The goal of the SOC is to shorten the dwell time as much as possible

Technology Limitation

  • Late Detection in the Attack Chain: L1 (Process Control Level) and L2 (Control System Level) monitoring often detects threats late in the attack lifecycle, after the attackers have already gained significant access or control, reducing the opportunity to prevent or mitigate the attack.
  • High False Positives: Monitoring at these levels tends to generate a large volume of false positives, overwhelming security teams and making it harder to identify genuine threats.
  • Swivel Chair: This suggests manual processes where operators need to switch between different systems or interfaces, reducing efficiency and increasing the likelihood of human error.
  • Long Learning Curve: Indicates that the technology or tools used for monitoring are complex and require a significant amount of time and training to understand and operate effectively.
  • Complex Integrations: Refers to difficulties in integrating multiple systems, devices, or platforms, which can complicate monitoring and reduce the effectiveness of security measures.

The goal of the SOC is to interrupts the attack in its early stage.

Operational Context

  • Lack of Operational Context: OT SOCs often lack specific operational context, such as relevant OT use cases, which limits their ability to understand and detect threats specific to the operational technology environment.
  • Domain Knowledge: There is often a lack of specialized OT domain knowledge among the SOC analysts, which is critical for correctly interpreting events, understanding the unique characteristics of OT networks, and responding effectively to incidents.

The need of Response Facilitation

This refers to the inability of an OT Security Operations Center (SOC) to effectively coordinate and execute response actions during an incident. This could be due to:

  • Reliance on Manual Processes: Due to safety concerns, complexity, and legacy systems, OT environments often cannot use automated response actions, leading to slower, manual interventions that can delay incident resolution.
  • Limited Communication and Coordination: Difficulty in ensuring seamless communication and collaboration between SOC analysts, OT operators, engineers, and other stakeholders, which is essential for a coordinated response.
  • Inadequate Playbooks and Procedures: A lack of well-defined, manual response playbooks specifically tailored to OT environments, which are crucial for guiding human-led actions and decision-making during incidents.

Compliance Requirements

Needless to say that almost every cybersecurity guideline or best practice national or international defines OT continuous monitoring as one of its controls.

This concludes part one of the series, let me know what you think and see you in the next part.

Ali Khan

Cyber Security Analyst/Manager

2 个月

Very impressive!

Looking forward to the upcoming articles in this series.

David Hernandez, GICSP

OT Cybersecurity Leader | AI Engineering Novice | Controls Engineer | Pharma | Manufacturing | Military Veteran | Practitioner

2 个月

Simplified, I think it’s to improve the signal to noise ratio. Having OT tools in place but not OT folks to tune and interpret the data is of lesser value. You have to know what you’re looking at to make sense of it.

Amr Eliwa

SOC && IR Manager || MSSP|CISSP|CISM|GCFA|GMON|GCIH|CCNA(RS/SEC)|CC|Qradar|Splunk|Arcsight

2 个月

Keep posting ??

Sophie Lv

P.Eng., OT cybersecurity, ICS, Critical Infrastructure

2 个月

These days, the regular SOC vs OT SOC focus on different events. Most traditional SOC these days don't have much understanding of OT events. Tough to provide proper monitoring in my opinion.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了