Supervise The UnSupervised Learning (Part 1)
www.autodesk.com

Supervise The UnSupervised Learning (Part 1)

What is Unsupervised Learning?

It’s too cliche to define unsupervised learning as many blogs on internet do. Let me start with a very simple example by taking you back in your childhood where everyone of us hated exams. In first situation imagine exam papers and answer key which decides your grades. In second situation imagine only exam papers without answer key, so how you would grade yourself?

So taking this example and conceptualizing it in Machine Learning which can make it a bit simpler. For example the first situation is a traditional situation where datasets have labels in this case answer keys called Supervised learning. Second situation is Unsupervised learning where the data has no labels and it has no outcomes as it just only analyzes input.

Unlike supervised machine learning, unsupervised machine learning methods cannot be directly applied to a regression or a classification problem because you have no idea what the values for the output data might be, making it impossible for you to train the algorithm the way you normally would. Unsupervised learning can instead be used for discovering the underlying structure of the data.

Why Un when we have Supervised?

The best time to use unsupervised machine learning is when you don’t have data on desired outcomes, like determining a target market for an entirely new product that your business has never sold before. However, if you are trying to get a better understanding of your existing consumer base, supervised learning is the optimal technique.

Unsupervised machine learning purports to uncover previously unknown patterns in data, but most of the time these patterns are poor approximations of what supervised machine learning can achieve. Additionally, since you don’t know what the outcomes should be, there is no way to determine how accurate they are, making supervised machine learning more applicable to real-world problems.

And Now the typical..

Types of Unsupervised Learning

Some of the algorithms used in unsupervised learning include:

I know what you are thinking, what are these several not even heard of. The very famous and generally talked about are K-means, PCA and SVD.

 Clustering

A very common confusion that one has is difference between classification and clustering and what actual clustering means. So, basically a very minute difference is classification is supervised learning and clustering is unsupervised learning.

Classification is the process of learning a model that elucidates different predetermined classes of data which is two-step process comprised of a learning step and a classification step. Clustering is a technique of organizing a group of data into classes and clusters where the objects reside inside a cluster will have high similarity and the objects of two clusters would be dissimilar to each other.  

K-means

K-means is one of the simplest unsupervised learning algorithms that solve the clustering problems. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters. The results of the K-means clustering algorithm are:

  1. The centroids of the K clusters, which can be used to label new data
  2.  Labels for the training data

Whatttt is thatttt !!!! Centroids, labels ???

Every book and blog explains K-means in that way. But why to do it traditionally, so lets see everyone loves pizza right? So imagine Dominos (my favorite) wants to open several stores in different areas of the city. The difficulty they face are-

  • They need to analyze the areas from where the pizza is being ordered frequently.
  • They need to understand as to how many pizza stores has to be opened to cover delivery in the area.
  • They need to figure out the locations for the pizza stores within all these areas in order to keep the distance between the store and delivery points minimum.

Resolving these challenges includes a lot of analysis and mathematics.

The Algorithm comprises of 3 steps:-

Step 1: Initialization

The first thing k-means does, is randomly choose K examples (data points) from the dataset as initial centroids because it does not know yet where the center of each cluster is.

Step 2: Cluster Assignment

Then, all the data points that are the closest (similar) to a centroid will create a cluster. If Euclidean distance is used between data points and every centroid, a straight line is drawn between two centroids, then a perpendicular bisector (boundary line) divides this line into two clusters.

Step 3: Move the centroid

Now, new clusters, that need centers. A centroid’s new value is going to be the mean of all the examples in a cluster.

We’ll keep repeating step 2 and 3 until the centroids stop moving, in other words, K-means algorithm is converged.

Complexity

K-means is a fast and efficient method, because the complexity of one iteration is k*n*d where k (number of clusters), n (number of examples), and d (time of computing the Euclidian distance between 2 points).

How do we choose the number of clusters k?

In case, it is not clear, we try different values of k, we evaluate them and we choose the best k value. It can be simply written in pseudo code-

best=kMeans(points)

for t in range(numTrials):

   C=kMeans(points)

   if dissimilarity(C) < dissimilarity(best):

      best= C

return best

Dissimilarity(C) is the sum of all the variability of k clusters

Variability is the sum of all Euclidean distances between the centroid and each example in the cluster.

What if the centroids chosen are unlucky?

Choosing poorly the random initial centroids will take longer to converge or get stuck on local optima which may result in bad clustering.


There are two solutions:

  1. Distribute them over the space.
  2. Try different sets of random centroids, and choose the best set.

PCA

Before defining PCA, let’s see what actually is dimensionality reduction.

As the name suggests anyone can guess it is process of reducing dimension of a given problem. High dimensionality will increase the computational complexity, increase the risk of overfitting and the sparsity of the data will grow. Hence, dimensionality reduction will project the data in a space with less dimension to limit these phenomena.

There are many ways to achieve dimensionality reduction, most common are-

  1. Feature Elimination- It is what it sounds like: reducing the feature space by eliminating features. As a disadvantage, though, there is no gain in information from those variables that are dropped.
  2. Feature Extraction- doesn’t run into this problem. Say one has ten independent variables. In feature extraction, create ten “new” independent variables, where each “new” independent variable is a combination of each of the ten “old” independent variables. However, creating these new independent variables in a specific way and order these new variables by how well they predict dependent variable.

Principal Component Analysis is basically used for dimensionality reduction and is a feature extraction technique i.e speeding up a slow program with selecting certain features that are useful and does not make the model over-fitting.

So this is normal definition but mathematical understanding gives a broader picture about the algorithm as how the values are calculated.

Mathematics behind PCA

Lets imagine dataset to which PCA is applied be represented as mxn matrix X.

 Y=PCA(X); results in projection of X

Step-1 Mean of Y

              M=mean(Y)

Step-2 Center the values in each column by subtracting the mean column values

                S=X-M

Step-3 Find Covariance of centered matrix S (A covariance matrix is a calculation of covariance of a given matrix with covariance scores for every column with every other column, including itself.)

              C=cov(S)

Step-4 Calculate Eigen values and Eigen vector of the above matrix which helps in finding underlying patterns in the data.

              values, vectors = eig(C)

Step-5 Perform reorientation. To convert the data into new axes multiply original data with eigenvectors, which suggests the direction of new axes.

Finally, the scores calculated from above step can be plotted and fed into the predictive model. Plots gives us the sense of how close/highly correlated two variables are.

When to use PCA?

  1. Want to reduce the number of variables, but aren’t able to identify variables to completely remove from consideration?
  2. Want to ensure variables are independent of one another?
  3. Is one comfortable making independent variables less interpretable?

If all answers are “yes” to all three questions, then PCA is a good method to use. If answer is “no” to question 3, one should not use PCA.


End Note

I hope this article would be useful for beginners in the field of machine learning and would now understand the world of Unsupervised Learning.

In the next article we would do some coding related to the algorithms we learned in this article. If you face any difficulties while understanding the concepts, feel free to write on the comment section.

Did you find this article helpful? Please share your opinions / thoughts in the comments section below.

Regards!!!
















Nicely explained in simplistic terms. Looking forward to the part 2 of this article. All the best !

回复
Sachin Singh

Business Insights & Data Analytics | Tableau Developer, Risk Management | Portfolio Management | BFSI | Ex-TransUnion CIBIL

6 年

Wow...simple words with lots of information. Keep writing

回复
Subha Masthanaiah

Presales and Bid Management at Edgeverve (An Infosys Subsidary)

6 年

nice*

回复
Subha Masthanaiah

Presales and Bid Management at Edgeverve (An Infosys Subsidary)

6 年

Vey Nike article sunakshi

回复
Srikant Kumar

Advance Data Scientist @ Honeywell Technology| Transforming Data for Effective Decisions

6 年

Thanks a lot!!! Great reference for beginners!! :)

回复

要查看或添加评论,请登录

Sunakshi Mamgain的更多文章

  • Employee In Search of New Waters!!!

    Employee In Search of New Waters!!!

    With the advancement in technology and increasing job opportunities in every field of work, employees are leaving the…

    2 条评论
  • Detecting Covid-19 in x-ray Images

    Detecting Covid-19 in x-ray Images

    Inspiration of the work Where it all began? The corona-virus outbreak came to light on December 31, 2019 when China…

  • The Art of Sampling

    The Art of Sampling

    When we are in the process of building a model corresponding to a dataset, we tend to focus on several steps such as:…

  • Vectorization Implementation in Machine Learning: TF-IDF

    Vectorization Implementation in Machine Learning: TF-IDF

    Facebook, Twitter, Instagram, Snapchat, Medium, Towards Data Science, Analytics Vidhya, Udemy,..

  • Stopwords: Important for the Language not so in NLP

    Stopwords: Important for the Language not so in NLP

    What is NLP? Language to humans is as important as to eat food.Hence with growth of Artificial intelligence (AI), one…

  • Exploring the Trees of Data: An approach to Data Visualization

    Exploring the Trees of Data: An approach to Data Visualization

    Mirror mirror on the wall visualize data in all Confused?? I know me too. When I first started with data science, I…

  • Supervise The UnSupervised Learning (Part 2)

    Supervise The UnSupervised Learning (Part 2)

    Hello everyone, welcome to the continuation article of Supervise The UnSupervised Learning. If you haven't gone through…

    1 条评论

社区洞察

其他会员也浏览了