"Tina", a Digital Twin for site reliability engineering and secOps
Tina, the Digital Twin, by DALL-E

"Tina", a Digital Twin for site reliability engineering and secOps

Digital Twins are predicted to become the "new black" of industrial performance. Let's see how it can help secOps and Site Reliability Engineering.

What are Digital Twins?

Digital Twins are a recent addition to the PaaS offering of Cloud providers: Azure Digital Twins went GA in December 2020, AWS IoT TwinMaker in April 2021, and Google Digital Supply Chain has been in review since July 2021.

Many notable effects of the IoT explosion are visible in our daily lives, for better or for worse. But this bang has also had a tremendous impact in the industrial world: from manufacturing machines to the real estate, physical entities are now fitted with a vast array of accurate, real-time sensors generating masses of telemetry.

No alt text provided for this image
A manufacturing dwashboard from AWS IoT TwinMaker

In advanced high-tech industries (automated factories & warehouses, the military, robotics exploration, or satellites), Digital Twins replicate the state of real construction chains (or robotic guidance, or weapon systems conditions...), digitally in the Cloud.

No alt text provided for this image
A 3D scene from Azure Digital Twins

They feed on telemetry sent by above-mentioned sensors to generate a logical representation of the physical entity’s health & performance. This numeric avatar can then be analyzed to make predictions from past trends, to generate dashboards and to run ad-hoc processes, from alerting to remediating.

IT systems are immaterial designs without physical representation

In the computing world, we do have our own custom factories to manage the lifecycle of applications: operating systems factories, CI/CD pipelines, chaos engineering laboratories, containers factories. We group them under the software factory umbrella. If we step back a little bit, we may even group them into larger constructs that we call software supply chains.

Yet software does not have a physical form: as the tautology goes, software is not hardware...

The IT industry has a long standing history of generating logs and events. They are streamed for maintenance and monitoring purposes. Under the wake of devOps and site reliability engineering, telemetry has been deeply ingrained into code, alongside business logic itself.

Telemetry is a native feature of modern IT. So, why should we care about Digital Twins?

The problem with Site Reliability Engineering

devOps introduced a bias in the way me handle software lifecycle: by left-pushing an increasing number of responsibilities to developers, monitoring and troubleshooting production environments has become a "dev thing".

No alt text provided for this image
The shift left model

This sharply contrasts many physical factories, where the practice of running production chains has remained an "ops thing" (even if the digitalization of analog devices has invited software development everywhere. Code is now playing a critical role in all industrial processes).

The value Digital Twins can bring to IT comes from a change of mindset: the design of a Digital Twin goes through an important process called modeling: it forces a systems representation of the physical entity. By looking at the software factory from such fresh angle, a digital twin brings complementarity to the usual components' view of a software factory.

This approach can be thought as an extension of Site Reliability Engineering (SRE), which was invented by Google to beef up ops capabilities in dev feature teams. Site reliability engineers are often understaffed and highly dependent on a set of local tools and practices, which makes their transformation mission relatively painful. Digital Twins push SRE capabilities one step further.

For me,

the promise of Digital Twins in modern IT is that site reliability operations can be decoupled from local feature team practices without breaking the devOps model

A digital counterpart of the software factory grants additional powers to SREs. But it is essential to avoid going back to the dark ages when ops had their own standalone IT infrastructure.

Striking a balance

So how to strike a balance between too much dev (as we have today) and too much ops (as we used to have)?

There has been countless debates on the matter. But I think Digital Twins bring a fresh opportunity: Digital Twins make SRE activities Cloud native and decoupled from business logic.

The real enablers for all this are Sensors and PaaS integration. They can make understaffed SRE a first class citizen of the Cloud.

Sensors enables decoupling, PaaS integration enables "Cloud nativeness".

To give you an idea of the latter, here is a list of integrated PaaS'es which can be called to the rescue when designing a Digital Twin:

  1. The Digital Twin service itself is used for system-wide modeling, sensors definition, telemetry definition, telemetry capture (in real time)
  2. Ledgers (append-only key/value stores) are used for telemetry recording, audit, replay, roll-back and consistency
  3. Analytics are used for making predictions on telemetry (time series)
  4. Messaging is used for turning telemetry into events
  5. Functions are used for events computation and events cascading

From theory to practice

Let's see how SREs can benefit from PaaS integration through a simple but practical example: since we discuss IT security, Cloud and architecture, we are going to look at an example taken from IT security.

Specifically,

  • The physical twin is going to be an IT security control that you should be familiar with if you are a subscriber of this newsletter: the network topology enforcement.
  • The corresponding digital twin is a construct named TINA (Temporally Informed, Numerical Appraiser).

The physical twin (network topology enforcement)

In a nutshell, what this control does is check whether the hub & spoke pattern is being enforced at the scale over a whole #Azure Tenant (or a #AWS root account).

The control makes continuous evaluations. It executes as an infinite loop that can be broken down into 4 stages:

  1. Collect ground truth from our Cloud backend. This collection is a JSON stream of network peerings;
  2. Check a cache to know if ground truth has already been satisfied by a theorem prover;
  3. If not, run the theorem prover to check satisfiability and update the cache
  4. Log theorem proof to a ledger.


No alt text provided for this image
The phyisical twin


Tina, the digital twin

The main danger when attempting to model a software factory as a digital twin is to... duplicate the initial code, in a different language!

We don't want to develop two software factories: one is good enough!

What we want to do is model the hub & spoke control as a production chain in an actual factory to gather system-level telemetry (not component-level), and leverage the power of PaaS integration to process the telemetry flow in real-time so that SREs can supervise the health of the factory as a whole system.

Here is the general layout of Tina. It is fitted with two "rooms":

  1. The Topology room, in charge of ordering a stream of network topology changes,
  2. The Calculation room, in charge of making sure that hub & spoke calculations are based on the latest network topology.

No alt text provided for this image
Tina

You see that, unlike in the "physical" software factory, the theorem proving process, the cache and the ledger are not modeled in Tina.

Tina doesn't care about software components. It cares about reliability and performance.

What's up next?

Stay tuned for the next installment, where I will walk you through a reference implementation of Tina in Azure. So it will involve modeling Tina as a DTDL model and integrating services like Azure DT of course, but also Azure Tables, Event Grid (for messaging) and Azure Functions.

要查看或添加评论,请登录

Christophe Parisel的更多文章

  • Adversarial lateral motion in Azure PaaS: are we prepared?

    Adversarial lateral motion in Azure PaaS: are we prepared?

    Lateral motion techniques are evolving in PaaS, and we should be worried. Let's discuss a risk confinement approach.

    19 条评论
  • How will Microsoft Majorana quantum chip ??compute??, exactly?

    How will Microsoft Majorana quantum chip ??compute??, exactly?

    During the 2020 COVID lockdown, I investigated braid theory in the hope it would help me on some research I was…

    16 条评论
  • Zero-shot attack against multimodal AI (Part 2)

    Zero-shot attack against multimodal AI (Part 2)

    In part 1, I showcased how AI applications could be affected by a new kind of AI-driven attack: Mystic Square. In the…

    6 条评论
  • Zero-shot attack against multimodal AI (Part 1)

    Zero-shot attack against multimodal AI (Part 1)

    The arrow is on fire, ready to strike its target from two miles away..

    11 条评论
  • 2015-2025: a decade of preventive Cloud security!

    2015-2025: a decade of preventive Cloud security!

    Since its birth in 2015, preventive Cloud security has proven a formidable achievement. By raising the security bar of…

    11 条评论
  • Exploiting Azure AI DocIntel for ID spoofing

    Exploiting Azure AI DocIntel for ID spoofing

    Sensitive transactions execution often requires to show proofs of ID and proofs of ownership: this requirements is…

    10 条评论
  • How I trained an AI model for nefarious purposes!

    How I trained an AI model for nefarious purposes!

    The previous episode prepared ground for today’s task: we walked through the foundations of AI curiosity. As we've…

    19 条评论
  • AI curiosity

    AI curiosity

    The incuriosity of genAI is an understatement. When chatGPT became popular in early 2023, it was even more striking…

    3 条评论
  • The nested cloud

    The nested cloud

    Now is the perfect time to approach Cloud security through the interplay between data planes and control planes—a…

    8 条评论
  • Overcoming the security challenge of Text-To-Action

    Overcoming the security challenge of Text-To-Action

    LLM's Text-To-Action (T2A) is one of the most anticipated features of 2025: it is expected to unleash a new cycle of…

    19 条评论

社区洞察

其他会员也浏览了