Demystifying AI: # 2
Machine Learning:
As discussed in article #1, Machine learning is a subset of AI. Machine learning is the science of getting computers to learn, from the data presented to them, from experiences, without being explicitly programmed. ML is one approach of AI, an approach that is based on statistics; hence it is also called the statistical approach or probabilistic approach.
So, how are you going to make the computers learn from data? Here is one way practitioners do it. Split the data in 2 parts in 80:20 proportions. The 80% dataset is called the training dataset and remaining 20% is called the validation dataset.
Training dataset
The training dataset is fed to the ‘learning algorithm’ aka an ‘untrained model’ and the ‘learning algorithm’ generates a hypothesis, a formula, a logic, a grouping, a target function “f” that best maps the input variable “x” to an output variable “y” - represented as {y = f(x)}
If you want to dig deeper, here is article in towardsdatascience.com that highlights the various popular algorithms that are used in machine learning. (https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11)
Validation dataset
The hypothesis/logic/function is fine-tuned with help of validation dataset. This dataset provides an unbiased evaluation of a model fit on the training dataset while tuning the model's hyper parameters. And, after that tuning of the parameters, what you get is the “Trained Model”. This ‘Trained Model’ is now used for inferencing or predicting. The ‘new’ data, where only the ‘input–x’ is known is fed to the trained model and the trained model predicts the ‘output vector – y’
The learning of computers happens in different ways and that gives rise to different kinds of Machine learning – Supervised learning, Unsupervised learning and Reinforcment learning.
I like this Machine Learning Bubble chart (courtesy Mactores Data Science team) because it shows the various kinds of machine learning and the various use cases where it is used. It captures the various machine learning models in a nice, compact graphic.
Here is a brief about these various Machine Learning models.
Supervised Learning
In Supervised Learning, the data that is fed to the algorithm is labeled. The input and output (labels) are both known. The computer/machine ingests this data and spits out a logic, a formula. There are 2 kinds of Supervised learning – Regression and classification.
Regression – In a regression problem, we are trying to predict results within a continuous output - meaning we are trying to map input variables to some continuous function. Let me explain with an example – If somebody is asked to predict the amount of revenues generated with $500 advertising budget, the person wouldn’t know where to start. But, if the person is given the data – (examples of revenues the previous advt. budgets have generated) he/she can generate a ×hypothesis (model).
The line that is drawn in the graph that fits the various points - is the continuous output, a hypothesis. If the person maps the input variable ($1000 advt. budget) to this continuous output, s/he will be able to predict the approximate revenues that budget might deliver is about $22500
Classification
In a classification problem, we are trying to map input variables into discrete categories. Some examples - Is this email a spam or not? Discrete categories: Spam & No spam. Is this pet a dog or cat? Discrete Categories: Dog and Cat. Is it this or that? Is it Yes or No? All examples of binary classification.
We also have multi-class classification where the data can be put into more than 2 classes.
Coffee cup size - Small, Medium or Large.
T-shirt size - XS, S, M, L, XL
In summary: Classification separates the data, regression fits data!
(credit: Deep Math machine learning.ai)
As this article is getting longer than I had expected, we will cover Unsupervised learning and Reinforcement learning in the next article.
Empowering brands to reach their full potential
2 个月Vs, thanks for sharing! How are you?