Fuzzy Wuzzy Matching
There's a lot of discussion around the possibilities of technology that use AI and machine learning algorithms, not just in the data science space, but in the broader business world as well.
Machine learning, over the last three decades, evolved from models like fraud detection (Amazon is a particularly famous use case for this), to social media algorithms, to speech and image recognition. The most recent phase that focuses on mimicking human interactions in particular (a caricature of a talking robot comes to mind for many) is what peaked the broader public interest in these types of algorithms because it's not something buried deep in the data model, but instead it's something that's much more tangible. There's a great article on the high-level prospects of machine learning with Michael I. Jordan (check out the YouTube lecture at the end)!
Power BI Weekly Updates
Unfortunately, though, talking robots are also the most complicated models. Humans aren't robots, and it's hard to use robots to replicate our behavior and thinking processes. However, there's a lot lower hanging fruit for pattern recognition algorithms that we often miss from a higher level, like the options for fixing spelling! Some of my favorite functionalities in Power BI are the ones that use the fuzzy matching algorithm. Two recent videos in my Power BI Weekly series in the Linkedin Learning library show three use cases for the fuzzy matching algorithm.
Power BI Dataflows
Week 14 shows how to apply fuzzy matching to an existing table in Power BI Dataflows through Power Query Online.
We can use the cluster values algorithm within a query entity of a Dataflow to perform two different transformation types.
If you're new to Power BI Dataflows and want to learn more, check out an entire course I built focusing on these online ETL frameworks!
领英推荐
Power BI Desktop
Within Power BI Desktop, we can use fuzzy matching to merge two tables within the Power Query Editor together before we load the data.
We can use fuzzy matching to merge data together with any of the Power Query join types (inner, outer, left, etc.), which work much like the joins with the same names in SQL queries.
Fuzzy Matching Outcomes
On a high-level, fuzzy matching sounds like a carnival game. But instead of matching physical items, we're using it to match words. It works by using natural language processing (NLP) to process text strings like words, and then clustering to group the similar text pieces together. The great news is that because the fuzzy matching algorithm is built directly into different functionalities within Power BI, we don't actually have to write any code ourselves.
The fuzzy matching for merging between tables has default parameters, and we can also tune the algorithm by adjusting the following input parameters:
If you would like to learn more about other built-in algorithms built-in text algorithms in Power BI, check out the sentiment score model from earlier in the Power BI Weekly series. You can also check out an entire course below on more topics in this space that I created for the library! It includes models like KMeans clustering, anomaly and outlier detection, and forecasting (ARIMA and TBATS), among others.
In next week's edition of the newsletter, I'll talk about one of my favorite topics in Power BI: maps!
-HW
Sr Financial Analyst at The Church of Jesus Christ of Latter-day Saints
2 年This was great! Thank you for showing us how to use fuzzy wuzzy matching in Power BI. It's truly a powerful tool.
I talk with businesses and Governments about Smarter Planning & Supply Challenges
2 年Thank you for the fuzzy word highlighting. Helen Wall The capability of this in domains such as user experience for searching and aggregate product comparisons with dirty inputs are something. Can use tomorrow
Storyteller | Linkedin Top Voice 2024 | Senior Data Engineer@ Globant | Linkedin Learning Instructor | 2xGCP & AWS Certified | LICAP'2022
2 年Amazing.. Really insightful and informative newsletter! Helen Wall ????