"Tina", a Digital Twin for site reliability engineering and secOps
Digital Twins are predicted to become the "new black" of industrial performance. Let's see how it can help secOps and Site Reliability Engineering.
What are Digital Twins?
Digital Twins are a recent addition to the PaaS offering of Cloud providers: Azure Digital Twins went GA in December 2020, AWS IoT TwinMaker in April 2021, and Google Digital Supply Chain has been in review since July 2021.
Many notable effects of the IoT explosion are visible in our daily lives, for better or for worse. But this bang has also had a tremendous impact in the industrial world: from manufacturing machines to the real estate, physical entities are now fitted with a vast array of accurate, real-time sensors generating masses of telemetry.
In advanced high-tech industries (automated factories & warehouses, the military, robotics exploration, or satellites), Digital Twins replicate the state of real construction chains (or robotic guidance, or weapon systems conditions...), digitally in the Cloud.
They feed on telemetry sent by above-mentioned sensors to generate a logical representation of the physical entity’s health & performance. This numeric avatar can then be analyzed to make predictions from past trends, to generate dashboards and to run ad-hoc processes, from alerting to remediating.
IT systems are immaterial designs without physical representation
In the computing world, we do have our own custom factories to manage the lifecycle of applications: operating systems factories, CI/CD pipelines, chaos engineering laboratories, containers factories. We group them under the software factory umbrella. If we step back a little bit, we may even group them into larger constructs that we call software supply chains.
Yet software does not have a physical form: as the tautology goes, software is not hardware...
The IT industry has a long standing history of generating logs and events. They are streamed for maintenance and monitoring purposes. Under the wake of devOps and site reliability engineering, telemetry has been deeply ingrained into code, alongside business logic itself.
Telemetry is a native feature of modern IT. So, why should we care about Digital Twins?
The problem with Site Reliability Engineering
devOps introduced a bias in the way me handle software lifecycle: by left-pushing an increasing number of responsibilities to developers, monitoring and troubleshooting production environments has become a "dev thing".
This sharply contrasts many physical factories, where the practice of running production chains has remained an "ops thing" (even if the digitalization of analog devices has invited software development everywhere. Code is now playing a critical role in all industrial processes).
The value Digital Twins can bring to IT comes from a change of mindset: the design of a Digital Twin goes through an important process called modeling: it forces a systems representation of the physical entity. By looking at the software factory from such fresh angle, a digital twin brings complementarity to the usual components' view of a software factory.
This approach can be thought as an extension of Site Reliability Engineering (SRE), which was invented by Google to beef up ops capabilities in dev feature teams. Site reliability engineers are often understaffed and highly dependent on a set of local tools and practices, which makes their transformation mission relatively painful. Digital Twins push SRE capabilities one step further.
For me,
the promise of Digital Twins in modern IT is that site reliability operations can be decoupled from local feature team practices without breaking the devOps model
A digital counterpart of the software factory grants additional powers to SREs. But it is essential to avoid going back to the dark ages when ops had their own standalone IT infrastructure.
Striking a balance
So how to strike a balance between too much dev (as we have today) and too much ops (as we used to have)?
There has been countless debates on the matter. But I think Digital Twins bring a fresh opportunity: Digital Twins make SRE activities Cloud native and decoupled from business logic.
领英推荐
The real enablers for all this are Sensors and PaaS integration. They can make understaffed SRE a first class citizen of the Cloud.
Sensors enables decoupling, PaaS integration enables "Cloud nativeness".
To give you an idea of the latter, here is a list of integrated PaaS'es which can be called to the rescue when designing a Digital Twin:
From theory to practice
Let's see how SREs can benefit from PaaS integration through a simple but practical example: since we discuss IT security, Cloud and architecture, we are going to look at an example taken from IT security.
Specifically,
The physical twin (network topology enforcement)
In a nutshell, what this control does is check whether the hub & spoke pattern is being enforced at the scale over a whole #Azure Tenant (or a #AWS root account).
The control makes continuous evaluations. It executes as an infinite loop that can be broken down into 4 stages:
Tina, the digital twin
The main danger when attempting to model a software factory as a digital twin is to... duplicate the initial code, in a different language!
We don't want to develop two software factories: one is good enough!
What we want to do is model the hub & spoke control as a production chain in an actual factory to gather system-level telemetry (not component-level), and leverage the power of PaaS integration to process the telemetry flow in real-time so that SREs can supervise the health of the factory as a whole system.
Here is the general layout of Tina. It is fitted with two "rooms":
You see that, unlike in the "physical" software factory, the theorem proving process, the cache and the ledger are not modeled in Tina.
Tina doesn't care about software components. It cares about reliability and performance.
What's up next?
Stay tuned for the next installment, where I will walk you through a reference implementation of Tina in Azure. So it will involve modeling Tina as a DTDL model and integrating services like Azure DT of course, but also Azure Tables, Event Grid (for messaging) and Azure Functions.