登录查看更多内容

Machine Learning - How would you explain the same to a 8 year old school going child ….

Ramanpreet Singh Dandona (PMP?)

AVP | Business Analyst|Product Manager|Passionate Innovator |Data Science Enthusiast|RPA

发布日期: 2020年4月20日

Machine learning is a hot topic these days everyone seems to be talking about it . Even people who do not know any thing about the so called science of data ( i.e. data science) do seem to refer often to the same - this creates curiosity in the minds of all about what is data science - Is it science? or magic ? - what it is ?

With more glamorous applications of Machine learning and artificial intelligence been spoken about these days like - Self driving cars , detection of Corona virus affected patients like china did recently a lot of people these days are keen to know what Machine learning is . In this article I will try to explain machine learning in as simple as terms as I would be explain to a school going child .The idea is to explain to people who have absolutely no idea on data science .

So what is Machine Learning ??

Machine learning is when you feed lots of data into a computer program and choose a model to “fit” the data, which allows the computer (without your help) to learn from data and come up with predictions.

( In this article I am not going to explain the mathematics behind that get these computer algorithms that generate these predictions - probably we can explain this to kids when they grow up.)

Machine learning is aptly named so , because once you choose the model to use and tune it (a.k.a. improve it through adjustments), the machine will use the model to learn the patterns in your data. Then, you can input new conditions (observations) and it will predict the outcome!- This sounds magic especially to kids but it isn't - In reality it a series of mathematically and statistically coded algorithms that do this - But let the kids thing it to be a magic at this stage …..

Types of Learning

Supervised Machine learning : where the data you put into the model is “labelled.” Labelled simply means that the outcome of the observation (a.k.a. the row of data) is known. For example, if your model is trying to predict whether your friends will go golfing or not, you might have variables like the temperature, the day of the week, etc. If your data is labelled, you would also have a variable that has a value of 1 if your friends actually went golfing or 0 if they did not.
Unsupervised machine learning -is the opposite of supervised learning when it comes to labelled data. With unsupervised learning, you do not know whether your friends went golfing or not — it is up to the computer to find patterns via a model to guess what happened or predict what will happen.

Supervised Machine Learning Models

Logistic Regression - is used when you have a classification problem. This means that your target variable (a.k.a. the variable you are interested in predicting) is made up of categories. These categories could be yes/no, or something like a number between 1 and 10 representing customer satisfaction.

Linear Regression -is the first and most simplest machine learning model and much easier to use and understand .It uses the concept of 'best fit line' taught in elementary school when using only one variable. This best fit line helps to make predictions. It is similar to logistic regression, but it is used when your target variable is continuous, which means it can take on essentially any numerical value. An example of the same would be selling price of a house in the market .

Linear regression is also very interpretable. The model equation contains coefficients for each variable, and these coefficients indicate how much the target variable changes for each small change in the independent variable (the x-variable). With the house prices example, this means that you could look at your regression equation and say something like “oh, this tells me that for every increase in 1ft2 of house size (the x-variable), the selling price (the target variable) increases by $25.”

K Nearest Neighbours (KNN)- KNN model can be used for both regression as well as a classification problem.The “K” part of the title refers to the number of closest neighboring data points that the model looks at to determine what the prediction value should be (see illustration below). You can choose K and you can play around with the values to see which one gives the best predictions. All of the data points that are in the K=__ circle get a “vote” on what the target variable value should be for this new data point. Whichever value receives the most votes is the value that KNN predicts for the new data point. In the illustration above, 2 of the nearest neighbors are class 1, while 1 of the neighbors is class 2. Thus, the model would predict class 1 for this data point. If the model is trying to predict a numerical value instead of a category, then all of the “votes” are numerical values that are averaged to get a prediction.

Support Vector Machines (SVMs) - This type of model tries to create a boundary between where the majority of one class falls on one side of the boundary (a.k.a. line in the 2D case) and the majority of the other class falls on the other side.is responsible for finding the decision boundary to separate different classes and maximize the margin.Margins are the (perpendicular) distances between the line( ie hyper plane) and those dots closest to the line.

Decision Trees & Random Forests- Decision trees are simple if and else conditions like in a flow chart and it grows into a tree .. You start at the top and ask questions about your observation (a row in your dataset) and follow the tree down until you reach an outcome, which would be your predicted y value.

The problem with decision trees is that it is too simple a problem that tends to learn the outcomes from your data ,i.e it overfits the data . To solve this problem Random forest models are utilized .

Random forest is essentially, a program will generate a bunch of Decision Trees, and each one will look a little different due to the randomness involved in where the model makes the decision splits. Then, the outcomes of all of these trees are averaged to get a final prediction. This method allows you to make smaller trees and reduce the variance of your model while keeping its accuracy, which is why it is very popular.

Unsupervised Machine Learning Models

Unsupervised Machine learning models are the ones where the data is not labelled ,i.e we do not know the outcomes of our models

K means clustering - This model can very well solve our clustering problem. It is also reffered to as Llyods algorithm.(In simple words, we just try to divide the whole data points or observations into different subgroups on the basis of some similarity or dissimilarity. These subgroups are called clusters).Take the simplest example of our mother earth, we all are humans and have certain attributes or behaviour's which are common to us and that differentiates us from animals and makes us a unique creature on this earth. However, still, we have so much diversity on the basis of different geolocation, the languages we speak, the food we eat, the clothes we wear, the way we talk, walk, etc.

The K Means algorithm first chooses the best K data points to form the center of each of the K clusters. Then, it repeats the following two steps for every point:

Assign a data point to the nearest cluster center
Create a new center by taking the mean of all of the data points that are now in this cluster.

DB Scan clustering -Density-based spatial clustering of applications with noise- The DBSCAN clustering model differs from K means in that it does not require you to input a value for K, and it also can find clusters of any shape (see illustration below). Instead of specifying the number of clusters, you input the minimum number of data points you want in a cluster and the radius around a data point to search for a cluster. DBSCAN will find the clusters for you! Then you can change the values used to make the model until you get clusters that make sense for your dataset.This model works better than K means when data points are very close together.

Neural Networks -The basic idea behind a neural network is to simulate (copy in a simplified but reasonably faithful way) lots of densely interconnected brain cells inside a computer so you can get it to learn things, recognize patterns, and make decisions in a humanlike way.Neural networks are the coolest and most mysterious models, in my opinion. They are called neural networks because they are modeled after how the neurons in our brains work. These models work to find patterns in the dataset; sometimes they find patterns that humans might never recognize.

Neural networks work well with complex data like images and audio. They are behind lots of software functionality that we see all the time these days, from facial recognition (stop being creepy, Facebook) to text classification. Neural networks can be used with data that is labeled (i.e. supervised learning applications) or data that is unlabeled (unsupervised learning) as well.

要查看或添加评论，请登录

Ramanpreet Singh Dandona (PMP?)的更多文章

Data Fabric - A key to future Data Management and Innovation .

2021年3月24日

Data Fabric - A key to future Data Management and Innovation .

In the era of prodigious business innovation data management is very important and core to business . A good data…

1 条评论
Artificial Intelligence what it is ?

2019年9月18日

Artificial Intelligence what it is ?

In computer science artificial intelligence (AI), sometimes called machine intelligence, is intelligence demonstrated…
The failure of Predictive Algorithms

2019年9月16日

The failure of Predictive Algorithms

Most of the Big Data & AI Companies are based on the assumption that predictive analytics will be able to solve…
How Machine Learning and AI Can Eliminate The Emotional Aspect Of Investing Money

2019年5月30日

How Machine Learning and AI Can Eliminate The Emotional Aspect Of Investing Money

The investment market has gained tremendous popularity over the past couple of years. However, the investment…
Artificial Intelligence tickles Hedge Funds.

2019年5月12日

Artificial Intelligence tickles Hedge Funds.

Artificial Intelligence, or “AI”, has featured heavily in industry innovation headlines for some time. Yet for all the…
The McKinsey 7-S Framework

2019年1月22日

The McKinsey 7-S Framework

How do you go about analyzing how well your organization is positioned to achieve its intended objective? While some…
Serializer and deseriaizer in Hive for Hadoop

2017年11月21日

Serializer and deseriaizer in Hive for Hadoop

SerDe: Serialiser and Deserialiser A SerDe is basically a library that Hive uses to read and write table rows to and…

See all articles

Machine Learning - How would you explain the same to a 8 year old school going child ….

Ramanpreet Singh Dandona (PMP?)

AVP | Business Analyst|Product Manager|Passionate Innovator |Data Science Enthusiast|RPA

So what is Machine Learning ??

Types of Learning

Supervised Machine Learning Models

Unsupervised Machine Learning Models

Ramanpreet Singh Dandona (PMP?)的更多文章

社区洞察

其他会员也浏览了

Machine Learning Algorithm

Random Forest

Top Machine Learning Algorithms For Data Scientists !!

Machine Learning - Part One

Machine Learning In 4 Minutes

Machine Learning: Knowledge Bound

Choosing the Right Tool for the Job: A Look at Popular Machine Learning Algorithms

Decoding The Holy Grail Of Machine Learning?

Machine Learning 101: 7 Reasons Why You Should Learn ML?

Understanding Machine Learning

So what is Machine Learning ??

Types of Learning

Supervised Machine Learning Models

Unsupervised Machine Learning Models

Ramanpreet Singh Dandona (PMP?)的更多文章

Data Fabric - A key to future Data Management and Innovation .

Artificial Intelligence what it is ?

The failure of Predictive Algorithms

How Machine Learning and AI Can Eliminate The Emotional Aspect Of Investing Money

Artificial Intelligence tickles Hedge Funds.

The McKinsey 7-S Framework

Serializer and deseriaizer in Hive for Hadoop

社区洞察

其他会员也浏览了

Machine Learning Algorithm

Random Forest

Top Machine Learning Algorithms For Data Scientists !!

Machine Learning - Part One

Machine Learning In 4 Minutes

Machine Learning: Knowledge Bound

Choosing the Right Tool for the Job: A Look at Popular Machine Learning Algorithms

Decoding The Holy Grail Of Machine Learning?

Machine Learning 101: 7 Reasons Why You Should Learn ML?

Understanding Machine Learning