Update on the Death of the Historian
Last spring before I left LNS, Matt Littlefield asked me to write an update blog to his original 2015 blog entitled, “Will the Data Historian Die in a Wave of IIoT Disruption?” I wrote the blog in July but it was never published by LNS. So here it is in full. This is still very much a dynamic subject in the wake of Data Ops, so your feedback is most welcome.
**************************************************************************
Almost 7 years ago, Matt Littlefield posted a blog entitled, “Will the Data Historian Die in a Wave of IIoT Disruption?” In it, Matt raised a number of issues facing traditional historians, beginning with their disruption by IIoT, whereby data can flow around control systems and historians directly to the Edge and Cloud. In addition, Matt pointed out the challenge in pricing models, e.g., tag-based vs. big data volumes, and the fact that Data Historians are not designed to handle time-series, structured and unstructured data that Data Hubs are now addressing. As Matt said, “… Data Historians provide volume and velocity but not variety.”
Specifically, Matt raised five questions that this author will address in the context of present-day trends and capabilities:
1.??????Will the Data Historian be central to the IIoT and Big Data story?
2.??????Which vendor is best positioned to capture future growth: a traditional pure-play Data Historian provider, a traditional automation provider with Data Historian offerings, or a disruptive IIoT provider?
Further, if we suppose the Data Historian will take a leadership role in the IIoT platform and meet the needs of end users. In that case, providers in the space will have to develop next-generation solutions that address the following:
3.??????How to provide a big data solution that goes beyond semi-structured time series data and includes structured transactional system data and unstructured web and machine data.
4.??????How to transition to a business/pricing model that is viable in a cheap sensor, ubiquitous connectivity, and inexpensive storage world.
5.??????How to enable next-generation enterprise applications that expand the user base from process engineers.
At the crux of the challenge lie two issues. First, will the Data Historian continue to play its vital role in serving its traditional user base of process engineers and other plant personnel? Matt answered that it is unlikely that the Data Historian will die any time soon. This author agrees. Data Historians will continue to play a vital role in the real-time stack at the plant level for years to come. The fact is that other database types, including the 30+ time-series databases that may run at the Edge or in the Cloud, are simply not performant enough (data volume handling, compression, read/write speed, calculations, redundancy, asset hierarchy, etc.) to be viable substitutes. And this remains true even if one were to re-architect them today. And yes, there are a few new alternatives, like the Libre stack on top of InfluxDB, but to date, the impact on the market has been minimal. More on the Hyperscalers later.
The second issue is question No. 1 above. There are two parts to the answer, and now we get into an architecture discussion. Part 1 is that if IIoT time-series data is relevant to the same users that access the Data Historian directly, then such data should flow back through the Data Historian either at the plant level or the corporate level. There is no point in sending all the IIoT time-series data to the Cloud, only to send it back down for analysis. This just adds latency to the data analysis workflow. Of course, one could architect a local Cloud-like data center equivalent at the plant site, but then this defeats many of the advantages of the Cloud.
Part 2 means we have to deal with question 3, meaning how and where to best deal with time-series, structured and unstructured data together. Again, this is more than a big data issue. It’s a holistic contextualization issue where one must manage and relate all types of streaming, batch, and asynchronous data. Data Historians do a great job synchronizing time series data, but less so with other data types—for example, high-frequency vibration data needing waveform analysis. As LNS has explained in previous blogs, the one-to-many structures we see in AVEVA PI AF, Honeywell Sentience (Forge’s predecessor ), and AspenTech’s IP.21 MDM are based on 10+-year-old technology, lead to less elegant and functional, and likely more costly, solutions that can be done today with newer database, object, and graph technologies.
LNS believes that the right way to tackle this is using a Data Hub, as shown in Figure 1. The Data Hub can be constructed in several ways, with components, out-of-the-box solutions, or even DevOp’d from scratch. With the Data Hub approach, the Data Historian is a data source, not a core component of the Hub itself.
Figure 1 - The Data Hub Within the OT Data Fabric (Process Industries)
Now, as to which vendor is best positioned to capture future growth … well, this is still an open question. As a result, vendors have approached the Data Hub challenge differently. So, let’s talk about some of them.
·????????AspenTech chose to stay with IP.21 and MDM as a core component of its AIoT Hub, pairing with the former legacy Sabisu software, now Aspen Enterprise Insights, with connectivity to Aspen Data Science Studio. As a result, AIoT features a containerized non-Cloud-native IP.21 running in AWS or Azure. Aspen Enterprise Insights handles the contextualization, workflow, and visualization. However, LNS thinks AIoT is more of a project than a product.
AspenTech has doubled down on this path continuing to invest in improving what they have and adding new features. No doubt it works but how elegant a solution remains a question. Look for a v14 release in Q4-2022. Nevertheless, LNS believes this was the suboptimal path, and AspenTech could have taken a more holistic and modern approach to a Data Hub by assembling a set of modern components, leveraging IP.21 as a data source. Furthermore, AspenTech’s new majority shareholder, Emerson, has a competing product based on German company Inmation that uses object technology running on top of Mongo’s NoSQL database. It makes no sense for them to compete with each other, so we shall see if this situation gets rationalized.
·????????AVEVA acquired OSIsoft PI with PI AF and OCS. OCS was OSIsoft's initial answer to a new version of PI running in the Cloud that still meets the analysis requirements of their traditional process engineer user base as well as data scientists. In addition, OCS, now Data Hub, features a new sequential data store that can align time series data with other data types through its proprietary semantic taxonomy, similar to object technology. The external interface is through APIs and Data Views, which support PowerBI’s SQL queries.
A demo by EDP-Renewables, a long-time PI user, at PI World in May of 2022 showed that Data Hub works with time-series data, but it took some tuning to determine the size and speed of data querying from Data Views. Thus, LNS sees that Data Hub is still a work in progress and is not quite ready for prime time. In our opinion, OSIsoft undershot the right target, which should have been a holistic approach to managing and contextualizing the three data types and not simply feeding PI users via data sharing in the Cloud. AVEVA has good reason for taking its approach which was not simply to replicate the PI Server and AF, but to date has not done a good job of communicating Data Hub’s capabilities in handling complex use cases and its associated benefits. AVEVA is aware of these challenges and is working to address them in future product releases and communications. PF AF is now also supported, so these relationships are not lost. However, LNS does not see many in AVEVA’s user base’s IT departments giving Data Hub the thumbs up since it is yet another proprietary database requiring APIs to connect and incorporate into the company's Cloud technology stack. It has taken OSIsoft/AVEVA 6 years to get to the present Data Hub, and they are still not quite there yet.
·????????Honeywell Forge for Industrial is working on a fresh approach to the Data Hub using graph technology. In doing so, they realized that taking the AspenTech path was not the right way to go. Honeywell has fairly ambitious goals for Honeywell Forge Enterprise Data Management (EDM) functionality. However, LNS believes that Forge EDM isn’t quite ready for prime time, so they continue to deliver on the older Sentience platform based on Honeywell Process Solutions Uniformance PHD, Insights, and Executive products. Uniformance Executive Uniformance Executive provides a model that organizes information, supports navigation, and provides an abstraction layer between data and displays. In addition, Uniformance provides data modeling functionality based on measures, hierarchies, and Honeywell’s Common Asset Model (CAM). This approach works but has the same limitations as AspenTech’s use of IP.21 and MDM.
·????????Uptake’s Fusion product, acquired from ShookIoT, took a different direction in leveraging Microsoft’s Azure Time Series database. Fusion handles all three data types contextualizing the data using object and graph technology, making it a more modern architected solution. Chevron, another long-time PI user, was an early adopter and is rolling it out throughout their business units. Since its initial release, Uptake has migrated from the Azure Time Series database to the newer and more performant Azure Data Explorer (ADX). Thanks to ADX, Fusion can also extract data for events and alarms, enabling it to handle work orders, batch changes, operator actions, routes, inspection results, analytical insights, and more. Fusion can upload PI AF relationships too. Thus, Fusion is a plus for those users committed to Microsoft’s Azure. However, on the other hand, many users prefer a hyperscaler-neutral approach since, with Fusion, you are locked into both Microsoft and Uptake for product support.
·????????Cognite calls its solution Cognite Data Fusion (CDF), an industrial DataOps platform. Cognite Data Fusion streams data into its CDF data model, where the data is normalized and enriched by adding connections between data resources of different types and stored in a graph index in the cloud. The CDF data model is an abstract model that organizes data elements and standardizes how they relate to one another and the properties of real-world entities. The CDF data model collects industrial data by resource types that let one define the data elements, specify their attributes, and model the relationships between them. The different resource types are used to both store and organize data. CDF has its own time-series database as well as the ability to store structured and unstructured data. In other words, CDF has a semantic taxonomy that enables the user to configure the data relationships, be they an asset hierarchy, time-series tag data, events, files, 3D data, and sequences. In addition, assets can be annotated and grouped. Data sets, containers for data objects, group and track data by the source to support data governance. Once in CDF, one can use the CDF services and tools to build solutions and applications. Connections to PowerBI, Grafana, simulation models, and other apps are ready out-of-the-box and can be orchestrated through CDF.
CDF has a few applications but is mainly the hub for connecting and managing other applications, such as digital twins and advanced analytics. If all of this sounds complex, it is, and so the downside is that it takes considerable time and effort to configure and maintain. Once implemented, it is powerful, but one is dependent on one vendor for all the Data Hub functions. Despite these challenges, Cognite has grown rapidly, starting in upstream and moving downstream into power and manufacturing. Recently, their originator, BP Aker, sold their interest to Saudi Aramco, so Cognite has plenty of fuel to grow.
·????????Finally, let’s look at a Data Hub component company, Element. Element’s Unify is software that builds rich data context at scale with no-code, automated data pipelines. The result is a single federated, contextualized, and persistent source of data from which users can establish a single version of the truth, that is, the data relationships, with a governance engine ensuring data integrity. Unify contextualizes all three data types, including the ability to upload PI AF. In addition, Unify provides out-of-the-box connectors and no-code data pipelines that allow the easy ingestion of metadata from OT systems, IT systems, spreadsheets, and P&ID systems. Related metadata and defined relationships are stored in Element Graph, Element’s implementation of graph technology, making it one of the market's most sophisticated yet easy-to-use Data Hubs.
Unify supports connectivity to AI/ML tools, BI and visualization, IIoT and Edge platforms, legacy systems, data warehouses, and data lakes. Unify runs on AWS and Azure. Element is even building out common data models, such as IEC 61970-301:2020, the common information model (CIM) for electric utilities. Thus, Element is ideal for those users wanting to use a component approach to a new or existing architecture and add sophisticated Data Hub functionality. It may also be the answer to some of the aforementioned vendors’ challenges.
So, as you can see, there is no shortage of approaches to the Data Hub challenge. But you might also be asking, “what about the other database vendors and the hyperscalers?” Without going into detail here, the short answer is that their databases do not adequately handle time-series data, lack contextualization or other required capabilities, or are simply not performant enough for industrial use. They may play a role in the Data Fabric but are not the Data Hub itself. Since most users are looking to buy, not build, it makes sense to choose a fit-for-purpose solution.
Nevertheless, it pays to keep an eye on the hyperscalers as they want to grow the number of users and functions on their platforms. In LNS’s view, it is inevitable that the hyperscalers will eventually provide full Data Hub functionality, either homegrown or through acquisition. If they succeed, their user friends in IT, who are now architecting the Data Hub, will favor them over custom or proprietary solutions from ISVs and automation vendors. Right now, domain experience is the differentiator and the barrier, but it may not last forever.
The business/pricing model is another open issue with which vendors struggle. Given multiple data types, one cannot continue embracing a time-series tag-based pricing model, which even the Data Historian vendors have moved away from for enterprise deals. Note that some SCADA vendors with historian functionality still use tag size in their pricing models. This means the pricing may be a combination of a threshold subscription plus the number of data type pipelines, source connections, and applications supported, not the data volume itself. Databases may still be priced on data volume, but LNS does not yet see a standard pricing model for the Data Hub.
Last but not least, Matt asked if can we enable next-generation enterprise applications that expand the user base from process engineers. First, let’s acknowledge that today we have new tools that process engineers want to use that provide greater analytics functionality than Data Historian GUIs, such as PI Vision or IP.21 Process Explorer. The Data Hub facilitates these tools. For example, Seeq’s and TrendMiner’s process data analytics can access data at the Data Historian level, at the Edge or Cloud, and through a Data Hub. So, it pays to choose tools that are not limited by evolving architectures.
Second, a Data Hub is ideal for setting up advanced analytics, be they systematic like those in Asset Performance Management (APM) or implementing any number and type of Digital Twins. Beyond that, the ability to relate data from multiple sources and types will foster deeper analysis across the asset lifecycle and the value chain. Right now, it’s a major pain to acquire and analyze data from engineering, operations, reliability, maintenance, inspection, quality, and EHS, not to mention to support sustainability and ESG reporting. Without the ability to relate data, one cannot analyze it – and this is the goal of the Data Hub ... to use the old Burger King slogan, “Have It Your Way.”
In summary, Data Historians will be with us for a long-time. And yes, there are still markets underpenetrated in their use, such as in discrete manufacturing. And there is no shortage of choices. Still, the action has upped the ante as the OT data fabric now incorporates IIoT sensors, Edge, Cloud, and various applications with different data types. This is the new battleground, and the Data Hub war is on. Clearly, we have a way to go and no one vendor yet dominates the category like OSIsoft did with the PI historian.
Finally, in reading this blog, you might be wondering, why is it that all the innovation is coming from startups? Why can’t large established firms innovate at the speed of startups? Good question. It would seem that the momentum of past success is the enemy of innovation. But that is a subject for another blog at another time and as Paul Harvey would say, “The Rest of the Story.”
Epilogue
Since I wrote this blog, not much new has been announced in the market. But look for forthcoming updates from both AspenTech and AVEVA as AspenTech releases aspenOne v14 sometime this November and AVEVA hosts AVEVAWorld on November 14-17, 2022 in San Francisco. Let’s see if there are any changes in product direction. And little birdies tell me to look out for announcements to come soon from our hyperscaler friends at AWS and Microsoft too.
Never a dull moment!
Chief Harbinger, Futurist
2 年“There is no point in sending all the IIoT time-series data to the Cloud, only to send it back down for analysis.” How does ML fit into that perspective? Sure Model development is local but that’s on a small subset of the data. Production datasets are too massive to manipulate locally.
Thanks Joe. Nobody has explained this as clearly as you have and it highlights the point that there needs to be a collaboration between partners to get the business goals that companies are trying to achieve.
Founder & COO @ Tree.ly | Improve Climate. Remove Carbon with our wonderful forests | also cofounder of sms.at, uboot.com, ucp morgen, Impossible Project, Fatfoogoo, Crate.io | Keynote Speaker
2 年couldn't agree more. a global central fast data store with edge/cloud, open and flexible, simply to connect, it is how modern analytics architectures have to be built. disclaimer I am founder of crateDB, just can say, our use-cases in IIoT grow massively, combining structured and semi-structured data at huge scale, real-time SQL with inexpensive storage.
Driving innovation in Energy
2 年The challenge/opportunity is to integrate the existing technologies with the new ones. This must driven by the business use case (as it has been mentioned in other comments). The opportunity to use plant data to directly drive business decisions goes beyond the OT thinking of the past, and it makes sense that new workflows and business use cases are built with modern technology and leveraging the cloud, as well as new edge devices that complement the traditional OT world. So, it is not about disruption, it is hybrid integration (OT, cloud, IT/OT) and business optimization at enterprise scale. We have now several examples of this in production, preserving and augmenting the existing OT.