One Minute Overview of Gaussian Mixture Models (GMM)
The?#52weeksofdatascience?newsletter covers everything from Linear Regression to Neural Networks and beyond. So,?if you like Data Science and Machine Learning, don't forget to?subscribe!
Level 1 - One Minute Overview for Data & Analytics Executives and Curious Minds
Category:?Unsupervised Learning (i.e., does not require labelled target data)
Sub-category:?Clustering (i.e., a grouping of objects / data points)
Main Idea:?GMM is a distribution-based algorithm differentiating it from other clustering algorithms such as K-Means (centroid-based), HAC (connectivity-based) or DBSCAN (density-based).
GMM assumes the existence of a specified number of Gaussian distributions within the data. Each distribution with its own mean (μ) and variance (σ2) / covariance (Cov). This results in the algorithm producing a probability of each point belonging to a specific cluster instead of using hard assignment exercised by other clustering algorithms.
To understand how GMM works in practice, we need to look at the?Expectation-Maximization (EM) algorithm. The EM uses an iterative method to?calculate and recalculate the parameters of each cluster (distribution), i.e., mean, variance/covariance, and size.
领英推è
I have created the below gif image to illustrate how GMM adjusts its parameters (μ, σ2, Cov) within each iteration instead of taking you through complicated maths.
Everyday use cases:?GMM is beneficial when your data has overlapping clusters, which may encourage you to choose a probabilistic view instead of drawing strict boundaries. E.g., you can imagine having different products with similar features, hence partially belonging to multiple clusters.
Level 2 - for Aspiring Data Scientists
I have written an?in-depth article?published on Towards Data Science explaining the inner workings of?the GMM algorithm.
Level 3 - for Data Science and Analytics Professionals
You can find a Jupyter Notebook with a complete Python code on my?GitHub repository. Use it as a guide to creating your own GMM clustering!