Fuzzy Wuzzy Matching
Scrabble: the original natural language processing (NLP)

Fuzzy Wuzzy Matching

There's a lot of discussion around the possibilities of technology that use AI and machine learning algorithms, not just in the data science space, but in the broader business world as well.

No alt text provided for this image

Machine learning, over the last three decades, evolved from models like fraud detection (Amazon is a particularly famous use case for this), to social media algorithms, to speech and image recognition. The most recent phase that focuses on mimicking human interactions in particular (a caricature of a talking robot comes to mind for many) is what peaked the broader public interest in these types of algorithms because it's not something buried deep in the data model, but instead it's something that's much more tangible. There's a great article on the high-level prospects of machine learning with Michael I. Jordan (check out the YouTube lecture at the end)!

Power BI Weekly Updates

Unfortunately, though, talking robots are also the most complicated models. Humans aren't robots, and it's hard to use robots to replicate our behavior and thinking processes. However, there's a lot lower hanging fruit for pattern recognition algorithms that we often miss from a higher level, like the options for fixing spelling! Some of my favorite functionalities in Power BI are the ones that use the fuzzy matching algorithm. Two recent videos in my Power BI Weekly series in the Linkedin Learning library show three use cases for the fuzzy matching algorithm.

Power BI Dataflows

Week 14 shows how to apply fuzzy matching to an existing table in Power BI Dataflows through Power Query Online.

No alt text provided for this image

We can use the cluster values algorithm within a query entity of a Dataflow to perform two different transformation types.

  1. Add a new column
  2. Group an existing table into a new table shape

If you're new to Power BI Dataflows and want to learn more, check out an entire course I built focusing on these online ETL frameworks!

No alt text provided for this image

Power BI Desktop

Within Power BI Desktop, we can use fuzzy matching to merge two tables within the Power Query Editor together before we load the data.

No alt text provided for this image

We can use fuzzy matching to merge data together with any of the Power Query join types (inner, outer, left, etc.), which work much like the joins with the same names in SQL queries.

Fuzzy Matching Outcomes

On a high-level, fuzzy matching sounds like a carnival game. But instead of matching physical items, we're using it to match words. It works by using natural language processing (NLP) to process text strings like words, and then clustering to group the similar text pieces together. The great news is that because the fuzzy matching algorithm is built directly into different functionalities within Power BI, we don't actually have to write any code ourselves.

The fuzzy matching for merging between tables has default parameters, and we can also tune the algorithm by adjusting the following input parameters:

  1. Similarity threshold
  2. Ignore case (as in whether or not the letters match by capitalization of characters)
  3. Match by combining text parts
  4. Maximum number of matches (optional)
  5. Use of a transformation table (optional)

No alt text provided for this image

If you would like to learn more about other built-in algorithms built-in text algorithms in Power BI, check out the sentiment score model from earlier in the Power BI Weekly series. You can also check out an entire course below on more topics in this space that I created for the library! It includes models like KMeans clustering, anomaly and outlier detection, and forecasting (ARIMA and TBATS), among others.

No alt text provided for this image

In next week's edition of the newsletter, I'll talk about one of my favorite topics in Power BI: maps!

-HW

Vincent Tepedino Penha

Sr Financial Analyst at The Church of Jesus Christ of Latter-day Saints

2 年

This was great! Thank you for showing us how to use fuzzy wuzzy matching in Power BI. It's truly a powerful tool.

回复
Chris Mousley

I talk with businesses and Governments about Smarter Planning & Supply Challenges

2 年

Thank you for the fuzzy word highlighting. Helen Wall The capability of this in domains such as user experience for searching and aggregate product comparisons with dirty inputs are something. Can use tomorrow

回复
POOJA JAIN

Storyteller | Linkedin Top Voice 2024 | Senior Data Engineer@ Globant | Linkedin Learning Instructor | 2xGCP & AWS Certified | LICAP'2022

2 年

Amazing.. Really insightful and informative newsletter! Helen Wall ????

要查看或添加评论,请登录

社区洞察

其他会员也浏览了