Data for AI: Best Practice Standards

Data for AI: Best Practice Standards

  • Without data AI cannot do its job
  • Use standard network protocols for wireless sensors and software interfaces for data from external systems
  • The recommendation is to deploy WirelessHART for sensors and OPC-UA for industrial software
  • The result is realizing the full benefits of industrial AI

Artificial Intelligence (AI) has already been proven in valuable applications for plants improving sustainability, reliability, and production. But AI runs on data, on inputs, just like a brain. Some forms of AI need data for the training phase. All forms of AI that does inference or classification such as prediction, need data for that phase. So, without data there is no AI. That is, without data exchange there is no AI. Data comes from sources like sensors and systems from multiple vendors. Therefore the interfaces to these sensors and systems must be standardized to make data access practical. Use standard protocols and standard software interfaces when interfacing with external systems. Custom coding to proprietary network protocols and proprietary software APIs is impractical. Especially for large scale systems with many vendors involved. Low layer network protocol standards not defining message formats or data types are also not practical. Full-fledged data interchange standards must be followed when building the data infrastructure which successful AI applications can be built on.?Since 14th October is the #WorldStandardsDay and since automation is a leading example of standardization, this year I again take this opportunity to celebrate those that develop standards.?So what standards apply? Here are my personal thoughts:

Retrieving Historical Training Datasets

In the ‘training’ phase machine learning uses a statistical algorithm like regression, principal component analysis (PCA), or support vector machine (SVM) etc. to find correlations in historical training datasets which is then used to build ‘models’ to compute numerical values or ‘agents’ for classification in the subsequent inference phase. Without data the machine learning algorithm cannot be ‘trained’.

Note that causal AI does not need training datasets because it does not require training since it is based on existing human knowledge and subject matter expertise such as well-known first principles (1P) and cause & effect (C&E). Therefore the recommendation is to use causal AI solutions when available.

The historical data which the machine learning algorithm is ‘trained’ on, must be retrieved from various plant systems such as the DCS, process historian, laboratory information management system (LIMS), or maintenance log from the computerized maintenance management system (CMMS) etc. before the algorithm can look for correlations in their data – automatic curve fitting. The training dataset is critical for the ‘training’ of the machine learning algorithm. The parameters included in the training dataset may inadvertently include confounding variables (variables which are not directly related but looks like though they are) which cause false positives or erroneous values in the inference phase. Or important causal parameters (variables which are directly related) may have been left out or are not available (not measured) causing events to be missed or erroneous values in the inference phase. Especially since operating conditions changes over time and if parameters for these conditions are not included in the dataset it will result in error. Similarly, some data points in the training dataset may be missing or wrong – particularly in the case of manually entered data – again causing events to be missed or erroneous values to be calculated in the inference phase.

Correlation does not mean causation

?When such problems are detected, the machine learning algorithm must be retrained on a different dataset. Over time the results should get better. ?That is, retrieving historical data from third-party systems for machine learning algorithm ‘training’ is not a one-time affair, it is repeated from time to time. It is therefore critical it is easy to retrieve, or “ingest”, data from the various third-party data sources. If it is too difficult, the system will fall into disuse.

Standard Historical Data Software Interfaces

Standards are key to make products from different vendors work together. The simplest example is a bolt and nut which must conform to the same standard to fit together. An imperial thread nut does not fit on a metric bolt. Lots of software use proprietary “application programming interfaces” (API), or just “interfaces” for short, for software apps to exchange data. There are various API styles including REST API. The APIs are generally proprietary although their specification may be published. But since they are owned and controlled by a single vendor there are well known risks such as APIs “breaking” in new revisions of software. If an API does not have an associated IEC standards number, it is proprietary, and caution is required. First, programming is done to implement the API in one software to talk to the other. And second, the interface code must be periodically updated for the life of the system to remain compatible with new versions of the software, operating systems, and security patches etc. Systems that require such upkeep tend not to last. That is, non-standard APIs are costly and incur risk. Multiply this with the large number of interfaces required to cover all the various external systems in the plant.

API can easily go awry

The recommendation is therefore to use standard software interfaces such as IEC62541 known as OPC-UA which is very well accepted in the industry and supported in leading industrial software and automation systems. OPC-UA has 2 main interfaces for historical data: Historical Data Access (HDA) for historical time-series data, and Historical Access (HA) for historical alarms & events and time-series data. As a result, support for OPC-UA in data sources and in the machine learning software make the training and retraining of the machine learning algorithm easier for the data scientists and engineers.

OPC makes it easy

When interfacing AI apps and other software to the DCS, SIS, and other critical systems, the recommendation is to use a data diode through an edge gateway to prevent changes to the critical systems thus protecting the robustness of the critical systems.

You will invariably hear data scientists working on machine learning say “we need more data” because their machine learning algorithm cannot find strong correlations in the datasets they have been given. The reason being either that the required causal data is not collected at all, or because the data was collected manually and therefore too infrequently. To provide data scientists with good training datasets the recommendation is to deploy more sensors to automate data collection. That is, put in sensors in place of manual data collection using portable testers such as for vibration, corrosion, acoustic noise, and temperature etc. And put in sensors in place of manual data collection reading mechanical gauges for pressure, temperature, level, and flow etc. These manually collected positions are referred to as “missing measurements”. As a result, more sensors make the work of data scientists and engineers a lot easier. Note that the machine learning algorithm need multiple examples of the events you want to capture, so if you want to capture rare events, something that happens only every few years, the machine learning algorithm will require many years of historical data to span multiple instances of the event of interest. That is, deploy sensors now to automate data collection to start building the historical data such that a good dataset is available by the time you deploy machine learning in the future. If insufficient historical data is available, not including enough instances of the event you want to identify or predict, then classification becomes limited to “normal” and “abnormal” (“anomaly detection”) without ability to identify specific abnormal conditions. Similarly, when the historical data is not covering the entire operating envelope for the variables you want to compute, there will be errors outside normal operating range. In short, machine learning requires a broad dataset with many parameters over a long period of time, so deploy sensors early.

Exchanging Real-Time Inference Data and Information

In the inference phase, or ‘runtime’, the models compute numerical values while the agents tell what state something is in or predict their future state. This is the same for both machine learning and causal AI. In the inference phase the computation of numerical values or classification is made based on real-time data input from the sensing system and other automation systems such as the DCS and process historian. Without data the AI cannot do its job.

AI without sensors is like a brain without senses.

Note that for causal AI the agents and models come already embedded in readymade apps. The app vendor specify what kinds of sensor or other input must be connected to the app.

Standard Real-Time Data Software Interfaces

Also for the real-time data, interfacing to external systems through coding to proprietary APIs is impractical as explained for historical data above since such “DataOps” is costly and has various challenges. It may work fine for a small proof of concept (PoC) trial in one plant but is too hard to scale across the plant and enterprise, and hard to maintain operational. The mindset must be ‘scale first’ – meaning to build on technologies that can scale out. The recommendation, again, is therefore to use standard software interfaces such as IEC62541 (OPC-UA) since it is very well accepted in the industry and supported in leading industrial software. OPC-UA also has 2 main interfaces for real-time data: Data Access (DA) for real-time time-series data, and Alarms & Events (A&E) for real-time alarms & events. Support for OPC-UA in data sources and in the AI-software makes integration of real-time data easy.

AI software supports OPC-UA

That is, OPC-UA has 2 historical interfaces and 2 real-time interfaces, total 4 interfaces to handle all kinds of data and applications.

OPC for all to see

Standard Sensor Network Protocol

Because laying cables for wired sensors and fitting intrusive sensors is very expensive, plants were built with the bare minimum of sensors required for control and safety. That was the practice of the past due to the sensors available. However, wireless networks and non-intrusive sensors have made the total installed cost (TIC) of sensors a lot lower than in the past, so today plants are deploying lots more sensors than what they did only a decade ago. These are often referred to as industrial IoT (IIoT) sensors or monitoring & optimization (M+O) sensors. Note that sensors which are wireless and non-intrusive are not “as cheap as chips”, but their TIC is low since cutting, drilling, or welding of pipes or laying cables is not required. That is, wireless and non-intrusive sensors are ideal to collect the “missing measurements”. They are usually not connected to the DCS, they go straight into a historian, data lake, or data fabric, and are not even shown on the P&ID. IIoT/M+O sensors are “beyond the P&ID”.

Sensorize the missing measurements

There are many wireless sensors using proprietary radio chips or low-layer protocols not defining message formats and data types. This makes integration of the measurement data into industrial systems hard. Especially with a mix of sensors from different vendors to cover all the required types of measurement: vibration, corrosion, acoustic noise, temperature, multi-temperature, pressure, differential pressure, radar level, and flow etc. For fully instrumented equipment and process units. The recommendation is therefore to instead use wireless sensors based on the IEC62591 (WirelessHART) standard. It is supported by leading industrial wireless sensor manufacturers. Moreover, it automatically converts measured values to HART-IP, OPC-UA (IEC62541), and other protocols in the gateway for easy integration into AI software and other systems. The configuration of sensors can also be managed centrally without going to the sensor. As a result, work becomes so much easier for the automation engineers that deploy the sensors and industrial AI apps.

Action Plan: Transformation by Automation

Engineers love a challenge, a problem to solve, and they are very creative. These standards make the data wrangling easy, so engineers in the plant’s operational departments can focus on the core problem be it sustainability, reliability, or production. As a result of the new information from AI apps enabled by the data, the work for the likes of sustainability managers, reliability engineers, and production supervisors becomes so much easier. The recommendation is for companies to assign a larger portion of the technology budget to their I&C departments to enable them to deploy the automation required to transform how work is done. Standards are key to digital transformation, industrial transformation, Industry 4.0, or whatever you prefer to call it.

OPC for industry

Lead the way. Schedule a meeting for the #WorldStandardsDay 14 October or today

Share this essay with your CTO and I&C manager now.

And remember, always ask vendor for product data sheet to make sure the software is proven and pay close attention to software screen captures in it to see if it does what is promised without expensive customization.

Well, that’s my personal opinion. If you are interested in digital transformation in the process industries, click “Follow” by my photo to not miss future updates. Click “Like” if you found this useful to you and to make sure you keep receiving updates in your feed and “Share” it with others if you think it would be useful to them. Save the link in case you need to refer in the future.

Ralf Vollmeier

Vertriebsingenieur bei Emerson Schweiz I Techniker HF Energie- und Umwelttechnik I Funktionale Sicherheit Ingenieur (TUV)

1 周

Great article, thanks for the useful information!

Shaik Jainudeen

Data Architect - Engineering Manager (Cloud Data & AI) - Smart City and Smart Manufacturing and intelligent mobility and connected systems, Industry 4.0, Digital Twin.

3 周

I agree

回复

要查看或添加评论,请登录

Jonas Berge的更多文章

  • Automation Intensity – How to Elevate Your Plant’s Level

    Automation Intensity – How to Elevate Your Plant’s Level

    Plants have amazing control systems, but you still see lots of manual activity in the field By increasing the plant…

    4 条评论
  • Prescriptive – Centralized Equipment Condition Monitoring

    Prescriptive – Centralized Equipment Condition Monitoring

    Maintenance technician needs to know what to do when equipment problems start to develop For equipment there is a…

    7 条评论
  • Prioritize Automation Technology & Talent

    Prioritize Automation Technology & Talent

    Automation is underpinning increased standard of living for a growing world population since the first industrial…

    1 条评论
  • The New Automation: Industrial AI

    The New Automation: Industrial AI

    Probabilistic approaches like machine learning AI with existing data is not ideal for many industrial use-cases Most…

    3 条评论
  • Wireless Automation for Safety

    Wireless Automation for Safety

    Lots has been done for safety, but plant operations are still challenged with incidents Work in the plant is…

    6 条评论
  • Wireless Automation for Sustainability

    Wireless Automation for Sustainability

    Plant operations are challenged with emissions, energy efficiency, and losses Work in the plant is transforming by…

    2 条评论
  • IIoT Smart with Automation Standards

    IIoT Smart with Automation Standards

    ‘Ordinary’ Industrial Internet of Things (IIoT) solutions use multiple proprietary technologies which brings lots of…

    2 条评论
  • Architecture for Capable Software and Tight Control

    Architecture for Capable Software and Tight Control

    Plant and business functions are carried out by specialized people and automation devices independently but not in…

    1 条评论
  • Wireless Automation for Reliability

    Wireless Automation for Reliability

    Plant operations are challenged with equipment failure, keeping up with preventive maintenance, and inspection rounds…

    2 条评论
  • Automation Engineers Set Pace of Industry

    Automation Engineers Set Pace of Industry

    Digitalization, digital transformation, and IIoT mean transformation of currently manual work processes to more…

    11 条评论

社区洞察