登录查看更多内容

Market Basket Analysis using Power BI and R

Milad Firouzi

Senior Business Intelligence Developer at Teck Resources Limited

发布日期: 2019年1月31日

As you may know according to Wikipedia R is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing.The R language is widely used among statisticians and data miners for developing statistical software and data analysis and Power BI is a business analytics service provided by Microsoft. It provides interactive visualizations with self-service business intelligence capabilities, where end users can create reports and dashboards by themselves, without having to depend on information technology staff or database administrators.

Today I want to talk about integrating these 2 important tools to come up with a solution to provide Market Basket Analysis and to do so I have used one the most useful data mining patterns called "Association Rule".Association rule mining is the data mining process of finding the rules that may govern associations and causal objects between sets of items.So in a given transaction with multiple items, it tries to find the rules that govern how or why such items are often bought together.

Given is a set of transaction data. You can see transactions numbered 1 to 5. Each transaction shows items bought in that transaction. You can see that Diaper is bought with Beer in three transactions. Similarly, Bread is bought with milk in three transactions making them both frequent item sets.Here we have ,

Support (s): Fraction of transactions that contain the item-set

Support(Bread=>Milk)=3/5=0.6=60%

Confidence (c): For a rule A=>B Confidence shows the percentage in which B is bought with A.The number of transactions with both A and B divided by the total number of transactions having A is called confidence.

Confidence(Bread=>Milk)=3/4=0.75=75%

Lift: Lift gives the correlation between A and B in the rule A=>B. Correlation shows how one item-set A effects the item-set B.

Lift(A,B)=P(AUB)/P(A)P(B)

Lift(Bread=>Milk)=3/0.64=4.6875

So, higher the lift, higher the chance of A and B occurring together.

Goal of Association Rule Mining

When you apply Association Rule Mining on a given set of transactions T your goal will be to find all rules with:

Support greater than or equal to min_support
Confidence greater than or equal to min_confidence

APRIORI Algorithm

Association Rule Mining is viewed as a two-step approach:

Frequent Item-set Generation: Find all frequent item-sets with support >= predetermined min_support count
Rule Generation: List all Association Rules from frequent item-sets. Calculate Support and Confidence for all rules. Prune rules that fail min_support and min_confidence thresholds.

Frequent Item-set Generation is the most computationally expensive step because it requires a full database scan.

Now Let's begin the MAGIC

First we need to have a data-set in R and use apriori algorithm on it to get some meaningful rules back. Install R on your operating system , I recommend R Studio IDE for better interaction.

install.packages("Matrix")
install.packages("RODBC")
install.packages("arules")
install.packages("methods")

Then use the installed libraries,

library(Matrix)
library(RODBC)
library(arules)
library(methods)

We want to read our data from a SQL database , the data has to be in transaction format meaning each row must contain all the items which was bought in one column and comma delimited. In Order to do so you can use STUFF function in sql to prepare the data in the desired format.

Go to ODBC Data Source Administrator and create a new System DSN connection to your data base server and database , don't forget to set the default database.Now let's get beck to R and read data from our source using the DSN name that we created.

con <- odbcConnect("****")
raw_data <- sqlQuery(con , "SELECT **** FROM dbo.********")
itemsn <- strsplit(as.character(raw_data$NameValues) , ",")
Digi <- as(itemsn , "transactions")

In this part we have started a connection using odbcConnect and DSN name , then read data using sqlQuery and our query , then split data using strsplit , after that convert it to "transaction" form using "as" function.

Then we need to apply apriori algorithm using 2 parameters , support and confidence.Here we need to state these 2 parameters in a way that we can have some applicable number of rules , you can observer your data to find out the best values for these 2 parameters.

m1 <- apriori(Digi , parameter =  list( support=0.002 , confidence = 0.1 , minlen = 2))

then we need to save the output table in order to use it in Power BI.

output<-as(m1, "data.frame")
trash_1 <- sqlQuery(con , "TRUNCATE TABLE dbo.*******")
sqlSave(con , dat = output , tablename = "******" , append = TRUE)

First we have changed the data into data-frame format , then we have truncated the destination table and finally saved data on it using sqlSave function.

We can save all R codes in a .R file and execute it in PowerShell or even schedule executing it in a SQL job on a daily basis.

Visulization

Now let's move to Power BI and visualization , open Power BI and select your destination table as data source , you will have a table like this in your source,

I've covered rule names for security issues because they are real data from a big retail company located in Iran.Now you need to create some new columns to take out your source and target items from the rule. I've prepared some easy DAX for that.Close and Apply this part and go to the data part create 2 new columns using these 2 DAX formulas.

Source = SUBSTITUTE(SUBSTITUTE(LEFT(ArulesResult[rules],FIND("=>",ArulesResult[rules],1)-1) , "{" , "") , "}" , "")

Target = SUBSTITUTE(SUBSTITUTE(RIGHT(ArulesResult[rules] , LEN(ArulesResult[rules]) - FIND("=>",ArulesResult[rules],1) - 1) , "{" , "") , "}" , "")

With these 2 formulas you can take out "Milk" and "Bread" from the rule {Milk}=>{Bread}.

Now get back to the report tab , in visualization part click ... import from store(market place) and search "Force-Directed Graph" and import it to your visuals.

In source use source , in target use target and in weight use lift as it is the most trusted value to understand how important and reliable a node is.

You can use 2 Slicers to filter your Source and Target items too.That'll be all.

Enjoy

I've removed item names again for security issues.

Sorry ;)

Source : RADACAD , DataCamp and my personal experience.

Mohammad Mubarah

Co-founder at Stealth Proptech

4 年

Thank you for this article , can you post the link of the Data.

Dave Rich, CFA, CMT, MBA

Available for project, contract, fractional services. CFO, FP&A, purchase/sale due diligence, rasing capital, turnarounds/restructurings. Bitcoin enthusiast since 2019.

6 年

Monroe "Monty" Williamson: Glad you liked this so it showed up in my stream...Thanks!

2 次回应

Mohammad R.

Talk about Operations, Digital Platforms, and Supply Chains!

6 年

Ashkan Hosseinpour

1 次回应

Matthias Giger

Senior Business Intelligence Consultant IBCS?, PowerBI & Tableau Certified

6 年

I did one with Tableau: https://public.tableau.com/views/WarenkorbAnalyse/WarenkorbanalyseSuperstore?:embed=y&:display_count=yes

3 次回应

查看更多评论

要查看或添加评论，请登录

Milad Firouzi的更多文章

Read websites' content using Python

2020年1月13日

Read websites' content using Python

When I think of when it all started , I cannot remember the exact time , but as far as I can remember I have always…

1 条评论
How to spot differences between images using Python

2019年9月28日

How to spot differences between images using Python

Yesterday Sep 27th was my all time Icon , Francesco Totti's birthday and I came across a very interesting video on…

2 条评论
Does a Business Intelligence developer need to know Python?

2019年9月12日

Does a Business Intelligence developer need to know Python?

Let's start this article with getting to know what exactly a BI (Business Intelligence) developer has to do or better…

8 条评论
Migrate data from MongoDB to SQL Server

2018年12月4日

Migrate data from MongoDB to SQL Server

We are working in the world that data matters , and it matters a lot. Nowadays we try to store non-relational and…

12 条评论
Top 10 Analytics And Business Intelligence Trends for 2018

2018年1月7日

Top 10 Analytics And Business Intelligence Trends for 2018

Over the past decade business intelligence has been revolutionized. Data exploded, and became big.

See all articles

Market Basket Analysis using Power BI and R

Milad Firouzi

Senior Business Intelligence Developer at Teck Resources Limited

Goal of Association Rule Mining

APRIORI Algorithm

Now Let's begin the MAGIC

Visulization

Enjoy

Milad Firouzi的更多文章

社区洞察

其他会员也浏览了

Top 10 Data Science Tools In 2023

My Top 3 Data Science Projects for Beginners: This Guide will get you started!

Data Analysis Power with Pandas DataFrames

Aggregation in Pandas DataFrame

The Journey to Becoming a Data Analyst: A Roadmap from Basics to Mastery

What is Data Cleaning?

A beginner’s guide to data analysis with Python in 2025

Get Started with Data Science - Minimum Viable Tool (MVT)

Boost Your Data Cleaning Workflow with PyJanitor

Working with Time Series Data in Python

Goal of Association Rule Mining

APRIORI Algorithm

Now Let's begin the MAGIC

Visulization

Enjoy

Milad Firouzi的更多文章

Read websites' content using Python

How to spot differences between images using Python

Does a Business Intelligence developer need to know Python?

Migrate data from MongoDB to SQL Server

Top 10 Analytics And Business Intelligence Trends for 2018

社区洞察

其他会员也浏览了

Top 10 Data Science Tools In 2023

My Top 3 Data Science Projects for Beginners: This Guide will get you started!

Data Analysis Power with Pandas DataFrames

Aggregation in Pandas DataFrame

The Journey to Becoming a Data Analyst: A Roadmap from Basics to Mastery

What is Data Cleaning?

A beginner’s guide to data analysis with Python in 2025

Get Started with Data Science - Minimum Viable Tool (MVT)

Boost Your Data Cleaning Workflow with PyJanitor

Working with Time Series Data in Python