#BigIdeas2021 : The New Data Science
We have a promising future of Artificial Intelligence (AI) ahead of us. But to be successful we must first learn to reject the fake visions painted by consultants eager to outdo each other. Most engineers don’t have a good handle on AI the way they have on mechanics, electricity, or chemistry. Data science has no first principles or scientific laws. It is very nebulous. So it can be hard to judge if claims made around analytics are realistic. Or you may end up using an overly complex kind of AI for a simple analytics task. It must be like the early days of thermodynamics and electromagnetism. So my #BigIdeas2021 is to identify the first principles and scientific laws of the data science discipline and teach these to increase the success rate and accelerate the adoption of AI. As usual for the annual prediction I will go broader than processing plants and a bit more theoretical. Here are my personal thoughts:
A New Data Science
In this era of Digital Transformation (DX), digitization, digitalization, Industrie 4.0, Industry 4.0, or Fourth Industrial Revolution (4IR) there are mixed signals around Artificial Intelligence (AI). On the one hand grandiose promises of AI by consultants, covered extensively by media, often causing unrealistic expectations from end-users. On the other hand we hear about AI projects that don’t live up to expectations. But there are also AI success stories. What works? What fails? A new data science can prevent hyperbole, temper the expectations, and increase success. With new data science principles and scientific laws, robust data engineering principles follow suit. With that, resources will be channeled the right way and we will benefit more from AI in a new era of data science and AI; “AI 2.0”. The possibilities to make life better are exciting. How do we reach this new maturity level for data science and engineering? How do you know the AI vision painted by a consultant is realistic or snake oil? Are there improvement opportunities you are missing because you are unsure of AI capabilities? How do you know what form of AI is the best to solve a particular problem? With the science and engineering of data well understood, like physics and chemistry, we will all be in a much better position to judge and evaluate for ourselves what is right.
The most interesting fact is that it must have been frustrating in the days before we knew the laws of conservation of energy and mass, gravitation, reflection, and refraction. And the laws of Avogadro, Boyle, Charles, Coulomb, Dalton, Faraday, Fourier, Gay-Lussac, Hooke, Joule, Kepler, Kirchhoff, Lenz, Newton, Ohm, Planck–Einstein, Stefan-Boltzmann, and more. Now, when somebody proposes a perpetuum mobile we can reject it because it breaks the laws of thermodynamics. Similarly, when somebody proposes uncovering information in a set of data that does not contain that information, we must be able to reject it.
How do I prove a claim around data analytics right or wrong so the company can invest correctly? There are many problems around us which we have learnt to live with because we don’t think there is a better solution. AI in some form can solve problems we could not solve before, but not all. How do you convince yourself AI can or cannot solve a particular problem without trial-and-error? There are many forms of AI: rule-based, machine learning (ML), and deep learning (DL) each in multiple variations. You may think the more advanced ML such as Artificial Neural Network (ANN) or DL is best, and in some use cases they are, but simple is often hard to beat and in many use cases simple rule-based is more robust. Many have an almost superstitious belief in the power of ML and DL to learn and predict. How do you know if a particular AI technique is right or not? It is not either-or; symbolic/knowledge/rule-based analytics, ML, and DL will coexist, each with its applications. They are tools in a toolbox. The key is knowing which to apply to each specific task.
The Science of Data
I have taken interest in the science of data and information ever since I read Tor N?rretranders book The User Illusion. Recently, when lots of remarkable claims started to appear around ML I looked to data science for ways to determine if claims are realistic. I didn’t find anything useful. Consultants are outdoing each other presenting wonderful possibilities of what could be achieved with AI analytics. When you see image recognition, voice commands, and AlphaGo it is hard not to. Some claims are achieved. Other claims are unbelievable. Armed with a set of data science first principles and rules we could make an objective judgment if claims are valid, and thereby unrealistic claims quickly refuted, and soon get to a point where only realistic claims are made in the first place. Here are a couple of everyday claims and my personal thoughts how data science first principles and rules could help:
Insufficient Data Problem
One of the claims goes “we will just take all your existing data and analyze it to uncover correlations and new insights to make better decisions”. This approach of using the Big Data you already have, collected over years, trapped in system ‘silos’ (such as the control system, business system, and personal spreadsheets in the case of plants) is very attractive because it is “only” software, and hitherto largely unused data would be put to good use. No need to install sensors or supporting hardware infrastructure. However, a few months and lots of money later they “uncover” what you already know about your process and equipment from physics and mechanics and say they need more data to be able to predict. That is, they discover you need more sensors after all. Yet process, mechanical, and reliability engineers know from experience additional measurements like vibration, acoustic noise, and temperature is required because those are the early warning signs of problems to come. The reason why the existing process sensors are insufficient is because by the time the problem is picked up by the existing process sensors, the problem has already gone too far. You need a change in a signal that indicates an event is about to occur. A pump bearing failure is a good example of this: by the time the bearing failure is visible on the discharge pressure it is already too late because it is a lagging indicator. You need a vibration sensor as a leading indicator where a change signals the bearing is starting to wear.
Lots of time and money can be saved if advanced sensors to collect the required data are put in from the very beginning. With the right sensors in place the AI analytics can do a fabulous job of providing early warning of failure. So there must be first principles or scientific law that states that without the right data which signals a change, analytics will not be able to find the information you are looking for. It might sound too simple to be stated as a law, yet it is overlooked all the time so it needs to be captured so it is not missed. And, Newton’s laws of motion and the laws of thermodynamics are also simple, but had to be written so they can be taught. It might be something along the lines of:
First an axiom such as data is a mix of information of interest you want and ‘noise’ you do not want. The information must be in the data set or else there is nothing to find. Laws and first principles help by prompting you to look in advance if you need more data such as from additional sensors or other data sources so you could install that ahead of time. The right sensors are a key recommendation for successful AI and analytics.
From there, you must separate the information from the ‘noise’; you must remove the ‘noise’ from the data to get to the information. Data is the ore; information is the nugget of gold. To get to the information you must throw out the ‘noise’ in the data. That is, at the first level, analytics means throwing away the ‘noise’ in the data, what you don’t want, leaving the information you do want. At the next level, analytics means extracting knowledge in the form of correlations and cause-and-effect relationships.
For condition monitoring of equipment like a pump, the data set may include streaming data of measurements of level and multiple vibration, pressures, and temperatures from advanced sensors. These are simple time-series variables. Software analytics may include Fourier transforms and standard deviation calculation, and detection if limits are exceeded. This provides and early warning of bearing wear, cavitation, strainer plugging, motor winding insulation breakdown, and mechanical seal leak etc. The early warning is the information of interest that remains. Indeed in the case of bearing wear the information of interest giving an early warning is probably only the vibration, possibly the temperature. All the other variables not providing any hints to the bearing problem, but they are leading indicators for other problems. In the specific case of detecting bearing wear the other data points are just ‘noise’. The raw data from the sensors need not be kept, it can be thrown away (in practical reality you may want to historize all for forensic purposes – I stated up front that this essay is theoretical and a bit academic).
Images and sound data sets are more complex than variables. Image data although also numbers, is different from simple variables like measurements and counts. An image is a matrix of millions of pixels of multiple bytes each in a file of several megabytes. Detecting presence or movement is easy. Identifying what is in the image such as a pied fur cat curled up against a complex background is difficult to identify. It is more inaccessible. It requires more advanced analytics with more CPU power. The algorithm must ignore all the ‘noise’ around in the background to reveal it is a cat.
Analytics means throwing away the ‘noise’ in the data, that which you don’t want
Also, it takes computational work to filter the information out of the data. This computational work is analytics. It is computational work in the form of computer CPU power. The more noise there is in the data the more computational work it will take to extract the information from that data. I have seen hefty servers running analytics. To get information free from noise, or close to. The more noise, the messier the data set is, the more inaccessible the information is, the higher the entropy. Another take on information entropy.
Analytics is the computational work to extract the piece of information in the sea of data
Additionally, with the right sensors in place, the analytics becomes simple; often a simple rule-based algorithm, even as simple as a high or low alarm. Analytics vendors build these rules into readymade apps and templates – no need for custom programming in Python or ‘R’. Because the apps and templates are ready-made, there is no need for datasets to train them on. Think of sensors as a form of analytics which physically isolates a single variable of interest (information) such as measuring vibration or differential pressure, without being affected by other variables (noise) such as ambient temperature or static pressure. Incidentally, chemical sensors are often called analyzers. The right sensors are a key recommendation for successful AI and analytics.
Large Training Data Set / Long Training Period Problem
Another problem with the “we will just take all your existing data and analyze it to uncover correlations and new insights to make better decisions” claim is that even though you may have 10 years of data, that data may still not include sufficient instances of the specific event you want to be able to predict, in order to establish a strong correlation with the information that is a leading indicator of that event, and to reject all the normal condition noise which should not confuse the result. For instance, you may need a data set which includes at perhaps 5 instances of an event (embedded inside a large amount of data during normal operation) to train the model on (plus perhaps another 5 instances to verify the model). For events that happens daily or weekly, even monthly, this may not be a problem. But for events that only happens every few years this becomes a problem since there will not be enough examples in only 10 years of data. Let’s again use a pump in a plant as an example. Pumps are very reliable. There are several years between failure. Even 10 years of data will not contain sufficient instances of failures to train the model. Moreover, pumps have multiple failure modes; each with different symptoms. You want to be able to distinguish between the various failure modes; diagnostics to provide a recommended action, so-called descriptive and prescriptive analytics. Multiple instances of each failure mode would only happen across an even longer time. Without sufficient events to train the model on, you must instead do the opposite. You instead train the model on ‘normal’ operation, such that the module can flag deviation from this normal operation, called ‘anomaly’. However, just flagging ‘anomaly’ is not as good as descriptive or prescriptive analytics. Therefore the machine learning data science approach works best for predicting events which are relatively frequent, such as process upsets. You could use data from other pumps in the plant or pumps in other plants to get a larger dataset with sufficient instances of each event you want to detect, but in this case the training data is no longer specific to each particular pump in your plant so you might as well use readymade pre-programmed analytics apps or templates instead thereby also avoiding the need for module training altogether.
Lots of time and money can be saved if we make sure we have sufficient history before we start. So there must be first principles or scientific law that states that training data must contain sufficient instances of the event, and with this goes the frequency of these events to get sufficient instances in a practical period of time or from a historical data set. Without sufficient history you will not be able to train a model and should instead consider a readymade app or be prepared to accept just ‘anomaly’ detection. Again, this might sound too evident to be stated as a law, yet it is overlooked all the time so it needs to be captured so it is not missed. It might be something along the lines of:
That is, the number of training exposures in a data set equals frequency multiplied by time in operation. Or, how long a period of historical data you need (or the time it takes to learn) equals the number of exposures required to learn with ‘sufficient’ confidence divided by the frequency of exposure (how often event occurs; unit is frequency like per second, per hour, per day, per year). This simple equation makes it clear that training is not a practical approach in low frequency use cases. Perhaps there is something even more fundamental than that? Laws and first principles help by prompting you to look in advance if you need more data over a longer time so you could judge if training a model is the right approach or if you should use a readymade rule-based app or template instead. For events which are infrequent, the recommendation is to use readymade analytics if available as apps or templates.
And how many exposures are required to learn anyway? What is the first principles or scientific law that states how many samples are required to learn with a certain confidence level? Is there something like Nyquist equation? A simple problem with a few variables in ML may only need 5 exposures, but for complex problems like image recognition from various angles, many thousands of sample images are required for each kind of item. An enthusiastic consultant may trivialize the training effort causing the project to miss completion date or fail. A pessimist may miss an improvement opportunity. First principles or scientific law would help.
How can we calculate the effectiveness of algorithms for certain problems to learn in fewer exposures, with target certainty? And how do we calculate expected certainty?
Complex Approach to Simple Task Problem
A problem with the almost superstitious belief in the power of ML to learn and predict is that consultant want to prescribe an ML solution to every analytics task, even though most problems have a simpler rule-based solution. For instance, most equipment and process unit operations are very well understood because they follow first principles and scientific laws of physics and chemistry, and because of this, readymade analytics apps and templates are available. Even an advanced piece of equipment like a gas turbine follow simple and well understood equations and cause-and-effect. With readymade analytics there is no need to gather historical data from multiple systems, data cleansing, analytics, model building, and training. Sure, there are many analytics use cases where rule-based does not work and ML or DL is required. This includes applications that involve human behavior (which is complex), large composites of multiple unit processes and equipment (also complex), and image and speech recognition (very complex).
Lots of time and money can be saved by using the right type of analytics. So there must be first principles or scientific law that can be used to guide our selection of the type of analytics to use for a particular task. What is the simplest type of analytics that can do the job? What is the most reliable, deterministic, type of analytics that can do the job? This would probably have something to do with how accessible or inaccessible the information is in the sea of data. That is, entropy of information.
The messier the data, the less accessible the information, the higher the entropy, the more computational work required for analytics
For analytics tasks where easily identifiable cause-and-effect relationships exist, rule-based analytics is best. For analytics tasks where easily identifiable mathematical correlation exists, an equation is best. ML and DL are required when there is high or very high entropy of information; the data is messy, disordered, the information hard to access.
For example, for well-known cause-and-effect relationships, such as for equipment failures having certain symptoms, for instance bearing wear manifesting itself as increased vibration and temperature, rule-based AI analytics works great. With the right sensors in place it becomes a low information entropy problem because with the right sensors the information is easily accessible. The sensor pretty much gives you the answer straight away. You just put an alarm on the vibration and temperature and that is sufficient for an early warning of bearing failure. Laws and first principles help by prompting you to use a readymade rule-based app or template instead whenever possible. For analytics of events with well-known cause-and-effect relationships, the recommendation is to use readymade analytics if available as apps or templates.
As another example, well known correlation, such as for efficiency of many types of equipment, for instance efficiency of cooling towers, equations based on first principles (1P) physics and chemistry work great. Again, with the right sensors in place it becomes a low information entropy problem because with the right sensors the information is easily accessible. Just compute the efficiency based on the readings from the sensors. It again helps by prompting you to use a readymade equation-based app or template instead whenever possible. For analytics of correlations with well-known equations, the recommendation is to use readymade analytics if available as apps or templates.
On the other hand, anything that involves human behavior is more complicated because the human mind is complex and thus have no simple cause-and-effect relations or mathematical correlation. This is where ML algorithms like decision tree, linear regression, logistic regression, naive Bayes, random forest, Support Vector Machine (SVM), and simple Artificial Neural Network (ANN) come into their own to predict power usage, ad clicks, call volume, loan repayment, optimum price, product perception, product preferences, spam, spread of disease, or job candidate performance.
Similarly in a plant, a process is a composite of a series of simple unit processes such as distillation, condensation, and cracking, each with multiple pieces of equipment which collectively exhibit complex interaction and behavior even though each unit process and piece of equipment is simple and well understood. Therefore ML may be used to predict process upset or product properties such as Reid vapor pressure (RVP). And for an amalgam of unit processes and equipment there will also be many sources of noise that make the desired information harder to extract. That is, a single unit process or piece of equipment may use rule-based analytics while a large composite of multiple unit processes and equipment may need ML of some type. It takes many samples of each kind of upset or product property to be predicted to train an ML algorithm. Laws and first principles help by prompting you to use ML when the information is hidden in a large amount of data noise. For analytics of patterns involving human behavior or complex processes, the recommendation is to use ML.
Images and sound are even more complex. Images are complex and hard to analyze because they are composites of many things, one in front of the other, and each thing made of multiple pieces layered one on in front of the other, and they all blend into each other such as leaves on the tree, hairs on a body, and odd shape of clouds. Rules for decoding an image (“seeing”) are impossible to describe. Therefore rule-based AI analytics cannot be used for images. Analyzing and image, identifying individual items is a lot of computational work as the entropy of information is very high; just a few items in a sea of data in an image file. In an image the background is the data noise. This is where DL with deep ANN comes into its own. It can take thousands of sample images of each kind of item to train an algorithm.
Sound is similar. Analyzing a voice against background noise is difficult. Rules for decoding sound (“hearing”) are impossible to describe. Laws and first principles help by prompting you to use DL when the information is hidden in a very large amount of data noise. For analytics of sound and images, the recommendation is to use DL.
Mature Data Science
Like I said, we don’t have the scientific laws or first principles of data science yet. But, by establishing scientific laws and first principles for data science it will be better taught and understood. Not every engineer needs to be a data scientist or programmer, but with more data engineers we can put AI analytics to better use in plants, in the office, and at home. Once data science is less nebulous, lots of engineers will be very good at it.
The immediate takeaways are that sensor and software are a system. You cannot do without sensors because you need sufficient data. The degree to which are cars and phones have been sensorized are good examples of how sensors enable new solutions. Algorithms can be trained on frequent events but infrequent event training is not practical. Use rule-based or equation-based analytics in the form of apps or template whenever you can.
Food for thought? Let’s nail down the first principles and laws of data science. And remember, always ask for product data sheet to make sure the software is proven, and pay close attention to software screen captures in it to see if it does what is promised without expensive customization. Well, that’s my personal opinion. If you are interested in digital transformation in the process industries click “Follow” by my photo to not miss future updates. Click “Like” if you found this useful to you and “Share” it with others if you think it would be useful to them. Save the link in case you need to refer in the future.
Energy Optimizer|AI Programmer|Quantum Computing Explorer
4 年One of the best articles I have read about the myths and realities of AI in industry.?I fully agree with the idea that AI should integrate process knowledge through models based on physical equations, gathering all the knowledge and experience accumulated by engineering, and on the other hand the use of ML techniques to predict, optimize or modeling systems of complex analytical solution.??Hybrid systems with both types of intelligence are more robust and have a greater success. Each problem needs its specific tools and machine learning is only a valid approach for certain scenarios, not the optimal one for all of them.
Industrial Automation | Project Management | I&C | Digitalization | Value Creator |
4 年Hello Jonas. Hope you ok and full of great ideas for this new year 2021. Here is my comment. I wouldn't compare thermodynamic first principles of science with data science. Even the AI?looks like a systematically organized body of knowledge, in my opinion, is just a set of tools, techniques, or apps. Nothing compares to a well defined first-principles equation based on scientific laws of physics and chemistry.? AI tools are successful if data is embedded with key information of all modes of failures of a piece of equipment. Sooner than later you will recognize that you miss some key variables, you miss sensors. On the other hand, if you have the right sensors, you wouldn't need AI algorithms. Contradictory? AI or first principles? Easier for me. Do you have well-defined thermodynamics? Then use it. Human or another factor no well defined are involved? Try data science algorithms. One rule of thumb in a plant we used before is visualizing the production facility from the top operations down to a simple piece of equipment. Some ML and AI is adequate in the top of the operations strategy. Down to a single equipment, thermodynamic first principles are better election.?
Science and Technology towards sustainable development, my views are my own and do not necessarily reflect views of my employer
4 年This is a great write up Jonas. I like your observation that nowadays AI and ML is almost believed in superstitiously/religiously, and we indeed badly need first principles/axioms of data science approaches in industrial systems in order to know what works and what doesn't. I would strongly recommend all decision makers in large organisations to read yor post :). I also like your observation that for image/sound recognition and human behavior DL/ML found huge breakthrough, however at prediction of physical systems behavior first principles approach alone, or in combination with ML does much better job than ML alone (i.e. don't reinvent the wheel). Simply - you very well summarized the current state of the art, unlike what consultancies would like their customers to believe in ??
with sharing and discusion to elavate the knowledge
4 年Agree with you Jonas Berge?,as your said as follow:"? ? ??But, by establishing scientific laws and first principles for data science it will be better taught and understood. Not every engineer needs to be a data scientist or programmer, but with more data engineers we can put AI analytics to better use in plants, in the office, and at home. Once data science is less nebulous, lots of engineers will be very good at it.This to remaind?me when analog and digital technology changing era.? ? ? ?