Artificial Intelligence vs. Data Science: Pygmalion vs Archimedes
??Karel Macek, PhD, ACC
AI/ML Team Lead @ Rohlik Group —AI solutions, leadership, and a human touch | 16 years in AI & Data | Mathematician | Coach | Boosting people and processes
Like many practitioners, I have been repeatedly asked to explain the AI and DS, including their differences. Potentially also to mention Machine Learning, Deep, Learning, or Big Data.
The Problem
Tools-Driven Definitions
There is a shockingly great volume of Venn diagrams trying to depict the difference by enumerating tools and approaches that are unique and that are shared. In principle, I can agree that Machine Learning is part of Artificial, Intelligence and Deep Learning is part of Machine Learning.
Source: https://rpafeed.com/difference-between-ai-ml-deep-learning-and-data-science/
Source: https://www.online-tech-tips.com/ms-office-tips/add-a-linear-regression-trendline-to-an-excel-scatter-plot/
Is Linear Regression Sophisticated Enough?
However, the relationship between Data Science and Artificial Intelligence is something where I cannot agree. Let me challenge these diagrams with an example of linear and logistic regression. Is it Data Science? Is it Machine Learning, i.e., Artificial Intelligence? We can find these tools in books from all the mentioned fields. However, if I add a trendline in my MS Excel spreadsheet, is it really Artificial Intelligence? No, many people will say. And the argument will be: it is a too simple model to be Machine Learning or Artificial Intelligence. If we have a credit-risk scoring system based on logistic regression or even a single threshold, the situation is similar. Is this the AI? Rather not.
The implicit argument that I have repeatedly met is sophistication. If it is not sufficiently complex, then it cannot be AI. This thinking is dangerous for two reasons:
- If Data Science can use only less sophisticated methods, then the practitioners are less smart than those of Artificial Intelligence.
- If Artificial Intelligence or Machine Learning has to adopt complex models not to be only Data Science or even - oh what humiliation - Data Analytics, the practitioners will tend to focus on the tools instead of the essence of the problem. Ockham's razor goes to scrap iron.
So what? What shall we do with the poor linear and logistic regression? Where is their home? Where is the border?
Fascination by Tools
Don Schmincke, in his book High Altitude Leadership states one risk: the fascination by tools. If a group of alpinists admires their new navigation system, they are in a risky situation. The way of defining or distinguishing AI or Data Science is, in many cases, tools-driven.
From the examples above, we see the trouble. Can we distinguish a carpenter and a builder by the discussion about hammers and screwdrivers?
Solution
The proposed solution is to switch from tools to the purpose of each field. Then, we can discuss the differences between these purposes. First, we have to focus on the individual identity of each domain as if the other one would not exist.
Purpose of Artificial Intelligence
The purpose of Artificial Intelligence is to make intelligent systems in an automated way to fulfill their objectives with the highest possible quality to serve a specific human need.
Artificial Intelligence is an intelligence that is artificial. It is also the field of creation of machines that exhibit intelligent behavior.
Intelligence is from Latin words inter (between) and legere (read). The reading between lines means the ability to convert inputs into some additional information that would not exist without intelligence.
Examples of intelligence:
- E-mail → sender's sentiment (here we speak literally about reading between lines)
- Pixes → seeing people without facial masks (see the picture above)
- Opinions of teachers → shool's time table
- Sunset on beach → lyric poem
Artificial is from Latin words ars (craft) and facere (make). Something made by a craft, something crafted has some important properties that shall be of interest - lets demonstrate it on artificial pearls:
- Mass production is possible (not for natural pearls)
- Symmetry and other geometrical properties are much better (in contrast to natural pearls)
- Without human invention and energy (=craft), they would not exist (natural pearls would)
- This craft would not happen if there would be no human need for pearls
We can see the same for Artificial Intelligence, too. The purpose of Artificial Intelligence is to make intelligent systems in an automated way to fulfill their objectives with the highest possible quality to serve a specific human need.
Purpose of Data Science
The purpose of Data Science is to provide an agnostic perspective on studied subjects by processing the recorded data to generate knowledge and insights.
Data Science is a Science that is Data. More specifically, it a field of human activities to get knowledge and insights from processing data.
Science is from Latin word scientia which stands for knowledge. Data is from Latin word dare which stands for give.
One could say: is there any non-Data Science? Even philosophy works with concepts such as human. If humans would not be given, philosophical treatises on ethics and anthropology would not exist. Biology, geology, medicine - how could one imagine them without data, without given evidence from the real world?
Data Science exhibits a deliberate agnosticism to theories. A Data Scientist can analyze biological data without being a biologist, geological data without being a geologist, medical data without being a medical doctor.
A perfect example of a Data Science exercise is the in-depth analysis of Covid-19 data written by my ex-colleague Raman Samusevich. He generated the hypothesis that social distancing matters because mothers on maternity leave have demonstrated lower spread. My colleague Jan M?rz would say - data have been tortured until they have not given the insight.
The purpose of Data Science is to provide an agnostic perspective on studied subjects by processing the recorded data to generate knowledge and insights.
Relationship between Data Science and Artificial Intelligence
- Option 1: Data Science as a part of Artificial Intelligence. AI creates machines that are intelligent. Tools that (help to) generate insights are examples of those machines. Data Science is Artificial Intelligence.
- Option 2: Artificial Intelligence as a part of Data Science. Data Science uses the data to generate new information (knowledge, insights). Artificial intelligence is about the ability of a machine to read between lines, i.e. to take the data and transform them into something that would not exist without intelligence. Artificial Intelligence is an example of Data Science.
- Option 3: Data Science overlaps with Artificial Intelligence. Scatter plot belongs to DS but not to AI. Logical programming belongs to AI but not to DS. Linear regression belongs to both of them.
- Option 4: Autonomous domains with a fruitful exchange. The disciplines are different since they have different purposes. To create a machine that will act intelligently to full fill human needs is not the same as analyzing data to learn something new. AI aims the satisfaction of human needs (AI will do it instead of me). DS aims the growth of human knowledge (I will know more because of DS). Still, there is a fruitful exchange possible. Data Science needs intelligence to interpret the data and generate knowledge. Artificial Intelligence systems - instead of being used stand-alone - can be tweaked to automate and simplify the process of acquisition of knowledge and insights. On the other hand, Artificial Intelligence can integrate knowledge and insights to be able to exhibit intelligent behavior better.
AI and DS without division, yet without mixing.
- Option 5: Perichoresis (Mutual Indwelling). This means strict autonomy and radical sharing. Strict autonomy means the purpose is strictly different Archimedes' Eureka (Data Science) is not Pygmalions' Galatea (Artificial Intelligence). Radical sharing means first of all sharing all the tools. Nobody would ban an AI practitioner for using scatter plots. It is allowed that Data Scientists use information inferred from expert systems as one of the features or factors for their analyses. In terms of tools, there is no boundary. However, we can think of even more radical sharing. Sharing of problem instances. Credit risk modeling problem is a Data Science problem - we want to know what are the risk factors, what are the patterns, how does it work. However, it is also an Artificial Intelligence problem - we want to automate the decisions to be more objective and less time-consuming. Artificial Intelligence gives everything it has to Data Science but the fact of being Artificial Intelligence (Galatea-seeking process). Data Science receives. Data Science gives everything it has to Artificial Intelligence but the fact of being Data Science (Eureka-seeking process)
After listing this, the readers can pick their preferred options themselves. My personal preference is Option 5 since it allows to understand AI and DS without division, yet without mixing.