A hitchhiker’s guide to Artificial Intelligence, Machine Learning, and Data Science - 3 insights for non-techie managers (part 1 of 3)

A hitchhiker’s guide to Artificial Intelligence, Machine Learning, and Data Science - 3 insights for non-techie managers (part 1 of 3)

Introduction

When I was attending business school, I am often asked questions about Artificial Intelligence (AI), Machine Learning (ML) and Data Science (DS) by my classmates.  

All of them recognize the impact that AI, ML and DS has formed or will create in their industries and most are keen to leverage these technologies meaningfully. Many of them also realize that they will never be data scientists or machine learning engineers but are nevertheless keen to learn more to better engage with their technical counterparts or craft strategies for their companies. 

These 3 tips are written for them, the non-techie managers who want to understand and learn how to leverage AI/ML/DS in their jobs but not necessarily want to be technical experts in those fields. In writing this, I drew upon my 10 years’ of field work experience as a data scientist, as a manager of a data science team and as a vendor building ML/DS solutions for clients.

I focus on principles when coming up with these 3 insights rather than on market trends. The space has evolved quickly in the last 10 years and I believe that we are only at the tip of the iceberg. My hope is that these 3 insights are evergreen and will serve you well as core, guiding principles for a very long time.

Insight 1: The subtleties between data science, machine learning, deep learning and artificial intelligence or why surveys and experiments are still useful

TLDR; With machine learning and deep learning making the headlines these days, there is a tendency to see these technologies as the Swiss Army knife to all questions. The most mature data-driven organizations rely on a toolbox of surveys, randomized control trials (RCT), machine learning, and deep learning. Machine learning and deep learning exploit patterns in historical data to make future predictions (correlation). Surveys allow you to understand why the patterns in the data exist. RCTs establish causality as opposed to correlation. Correlation-based methods are great for indication of future behaviour whereas surveys and RCTs are very useful for interventions to change behaviour.

The terms data science, machine learning, deep learning and artificial intelligence have entered our everyday vernacular. While they are often used interchangeably, understanding the subtle differences between them will help companies ascertain hiring strategy and capabilities, build the right teams to run product development and testing, help people understand the difference between various sources of analysis and how these can help to drive the right business outcomes.

Figure shows how DS, AI, ML and DL fit together
Figures shows the difference between DL, ML and AI


Artificial intelligence (AI)

Wikipedia defines artificial intelligence (AI) as “intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and animals and is used to describe machines that mimic "cognitive" functions that humans associate with other human minds, such as "learning" and "problem solving".

An example of an AI application that is not machine learning is a rule-based system where the domain knowledge from experts is directly encoded into carefully crafted rules that can be applied by computers. Some of the Robotic Process Automation (RPA) use cases fall into the space of rule-based systems e.g. a software agent is preprogrammed to extract records from different systems to compile into a financial report at monthly intervals.

Machine learning (ML)

Wikipedia defines machine learning as the “scientific study of algorithms and statistical models that computer systems use in order to perform a specific task effectively without using explicit instructions, relying on patterns and inference instead. It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to perform the task”.

There are 3 broad areas of machine learning, namely

Supervised learning

Unsupervised learning 

  • Unsupervised learning algorithms take a set of data that contains only inputs, and find structure in the data
  • An example of unsupervised learning is when you use clustering, an unsupervised learning technique, to find groups or clusters of your customers based on their behavioral or demographic attributes
  • Another application of unsupervised learning is anomaly detection where the assumption is that anomalous data points separate out naturally due to their distinctive attributes. A use of anomaly detection is in fraud detection where fraudsters change their methods so quickly that it may not be feasible to wait till enough samples are collected to train a supervised learning algorithm.

Reinforcement learning

Figure from the paper “Reinforcement learning-based multi-agent system for network traffic signal control”

Deep learning (DL)

Wikipedia defines deep learning as “as part of a broader family of machine learning methods based on artificial neural networks. Learning can be supervised, semi-supervised or unsupervised”. 

In my experience, deep learning is typically superior to other machine learning methods for text (e.g. natural language processing, machine translation), audio (speech recognition), image (computer vision) and video data.

Data Science (DS)

Wikipedia defines data science as “a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data” and that it is a “concept to unify statistics, data analysis, machine learning and their related methods" in order to "understand and analyze actual phenomena" with data” and “it employs techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, and information science.”

Another way of understanding Data Science and is to recognize how it is different from ML and DL. 

Firstly, ML and DL, by and large, exploits the “what” in the data. For example, you may build a machine learning model that accurately predicts whether a customer is likely to repurchase an item from your website and you find certain predictors are highly correlated with the likelihood of purchase. That’s as far as you can go with ML or DL - you do not understand why consumers / people behave in the way they did or why certain phenomenon happensThis is where surveys come in - they allow you to understand why certain people or certain things behave in a certain way. Survey design is a science on its own and falls under data science but not under ML or AI.

Secondly, machine learning and deep learning, by and large, rely on correlation to make inferences and those inferences do not establish causality. For example, you see that in your ML model that increases in search advertising spend is highly correlated with increased sales. In order to establish a causal relationship between search advertising and sales, you will need to run a randomized control trial or A/B test to rule out factors such as holiday season effect. Experiments or Randomized Control Trials (RCT) fall into Data Science and not ML, DL or AI

Continue reading insight 2 - The interplay between data, people and tech or why data trumps people and tech and insight 3 - Evaluating the models produced by data scientists or why you should revise basic statistics.

Sengmeng . 邱胜铭

AI Singapore | SCS Fellow | ISO SC42

5 å¹´

Thanks for sharing Zhihao. Good Article! IMHO, the AI term itself, should be segregated and distinct from ML and current advanced techniques and methodologies. We should be advocating the proper definitions of AGI and ANI, and even then, I’m always tell my audience AI is Augmented (Human) Intelligence so that we see applications of ANI as a collaborative and augmentative tool to us

赞
回复
David Owen

Board-level strategic development, innovation & growth.

5 å¹´

A nice write-up Zhihao - more of this is needed because we are still in a time when these and many other terms are used interchangeably.? What is also intriguing from a philosophical perspective is exactly what intelligence is - we can't define human intelligence (what is consciousness) so how to define it for machines....

Rama Al Jayyousi

Communication, Marketing & Business Development

5 å¹´

Who needa AI when you're on the road to know where.... When r u coming back

赞
回复

Very well-written, Zhihao. I think this will be a useful reference for managers and especially executives. The point on differentiating between correlation and causality is fundamental.?There is a third kind of dependency that is, unfortunately, much less well-known: counterfactual dependencies. Put simply, it is any implication statement such as "If smart beta did not have superior returns, there would not be so many factor ETFs available.". I think the latter is likely to gain in importance and clearly differentiate itself from causal dependencies in the next few decades as the world becomes more transparent and non-Nashian.

赞
回复

要查看或添加评论,请登录

Zhihao (Z) L.的更多文章

社区洞察

其他会员也浏览了