One Minute Overview of Gaussian Mixture Models (GMM)
Gaussian Mixture Models (GMM). Image by author.

One Minute Overview of Gaussian Mixture Models (GMM)

The?#52weeksofdatascience?newsletter covers everything from Linear Regression to Neural Networks and beyond. So,?if you like Data Science and Machine Learning, don't forget to?subscribe!

Level 1 - One Minute Overview for Data & Analytics Executives and Curious Minds

Category:?Unsupervised Learning (i.e., does not require labelled target data)

Sub-category:?Clustering (i.e., a grouping of objects / data points)

Main Idea:?GMM is a distribution-based algorithm differentiating it from other clustering algorithms such as K-Means (centroid-based), HAC (connectivity-based) or DBSCAN (density-based).

GMM assumes the existence of a specified number of Gaussian distributions within the data. Each distribution with its own mean (μ) and variance (σ2) / covariance (Cov). This results in the algorithm producing a probability of each point belonging to a specific cluster instead of using hard assignment exercised by other clustering algorithms.

To understand how GMM works in practice, we need to look at the?Expectation-Maximization (EM) algorithm. The EM uses an iterative method to?calculate and recalculate the parameters of each cluster (distribution), i.e., mean, variance/covariance, and size.

I have created the below gif image to illustrate how GMM adjusts its parameters (μ, σ2, Cov) within each iteration instead of taking you through complicated maths.

GMM algorithm in action. Image by author.

Everyday use cases:?GMM is beneficial when your data has overlapping clusters, which may encourage you to choose a probabilistic view instead of drawing strict boundaries. E.g., you can imagine having different products with similar features, hence partially belonging to multiple clusters.

Level 2 - for Aspiring Data Scientists

I have written an?in-depth article?published on Towards Data Science explaining the inner workings of?the GMM algorithm.

Level 3 - for Data Science and Analytics Professionals

You can find a Jupyter Notebook with a complete Python code on my?GitHub repository. Use it as a guide to creating your own GMM clustering!

要查看或添加评论,请登录

Saulius Dobilas的更多文章

  • Strategy #13 - Thinking From Future Back

    Strategy #13 - Thinking From Future Back

    The #52weeksofdecisionmaking newsletter explains different ways of making decisions in a quick and easy-to-digest…

    2 条评论
  • Strategy #12 - Start With The Hardest Part

    Strategy #12 - Start With The Hardest Part

    The #52weeksofdecisionmaking newsletter explains different ways of making decisions in a quick and easy-to-digest…

  • Strategy #11 - Asymmetric Payoff

    Strategy #11 - Asymmetric Payoff

    The #52weeksofdecisionmaking newsletter explains different ways of making decisions in a quick and easy-to-digest…

  • Strategy #10 - Specificity Is Good

    Strategy #10 - Specificity Is Good

    The #52weeksofdecisionmaking newsletter explains different ways of making decisions in a quick and easy-to-digest…

  • Strategy #9 - Exploration vs Exploitation

    Strategy #9 - Exploration vs Exploitation

    The #52weeksofdecisionmaking newsletter explains different ways of making decisions in a quick and easy-to-digest…

  • Strategy #8 - Metrics Are Only Proxies For What You Care About

    Strategy #8 - Metrics Are Only Proxies For What You Care About

    The #52weeksofdecisionmaking newsletter explains different ways of making decisions in a quick and easy-to-digest…

    4 条评论
  • Strategy #7 - Exponential Backoff

    Strategy #7 - Exponential Backoff

    The #52weeksofdecisionmaking newsletter explains different ways of making decisions in a quick and easy-to-digest…

  • Strategy #6 - Capabilities Also Define Disabilities

    Strategy #6 - Capabilities Also Define Disabilities

    The #52weeksofdecisionmaking newsletter explains different ways of making decisions in a quick and easy-to-digest…

    3 条评论
  • Strategy #5 - Narrow Framing vs Broad Framing

    Strategy #5 - Narrow Framing vs Broad Framing

    The #52weeksofdecisionmaking newsletter explains different ways of making decisions in a quick and easy-to-digest…

  • Strategy #4 - One-Way Door vs Two-Way Door

    Strategy #4 - One-Way Door vs Two-Way Door

    The #52weeksofdecisionmaking newsletter explains different ways of making decisions in a quick and easy-to-digest…

    2 条评论

社区洞察

其他会员也浏览了