Machine Learning 9: 'Sequential Rule Mining'

Machine Learning 9: 'Sequential Rule Mining'

Sequential Rule Mining is a data mining technique which consists of discovering rules in sequences. Sequential Rule Mining has many applications for example for analysing the behaviour of customers in supermarkets or users on a website or passengers at an airport.

Discovering sequential patterns in sequences

An important data mining problem is to design algorithm. To understand the sequences of the patterns in activities pattern data set for discovering hidden patterns in sequences, we have to implement a bunch of Sequence Rule Mining algorithms and Pattern Mining techniques. There have been a lot of research on this topic in the field of data mining and various algorithms have been proposed.

A sequential pattern is a subsequence that appear in several sequences of a dataset. For example, the sequential pattern <{a}{c}{e}> appears in the two first sequences of our dataset. This pattern is quite interesting. It indicates that customers who bought {a}, often bought {c} after, followed by buying {e}. 


Such a pattern is said to have a Support of two sequences because it appears in two sequences from the dataset. Several algorithms have been proposed for finding all sequential patterns in a dataset such as Apriori, SPADE, Prefix Span and GSP. These algorithms take as input a sequence dataset and a minimum support threshold (min-sup). Then, they will output all sequential patterns having a support no less than min-sup.  Those patterns are said to be the frequent sequential patterns.



Association Analysis

There are a couple of terms used in association analysis that are important to understand. Association rules are normally written like this: {Diapers} -> {Beer} which means that there is a strong relationship between customers that purchased diapers and also purchased beer in the same transaction. In the above example, the {Diaper} is the antecedent and the {Beer} is the consequent. Both antecedents and consequents can have multiple items. In other words, {Diaper, Gum} -> {Beer, Chips} is a valid rule.

Support is the relative frequency that the rules show up. In many instances, you may want to look for high support in order to make sure it is a useful relationship. However, there may be instances where a low support is useful if you are trying to find “hidden” relationships.

Confidence is a measure of the reliability of the rule. A confidence of .5 in the above example would mean that in 50% of the cases where Diaper and Gum were purchased, the purchase also included Beer and Chips. For product recommendation, a 50% confidence may be perfectly acceptable but in a medical situation, this level may not be high enough.

Lift is the ratio of the observed support to that expected if the two rules were independent. The basic rule of thumb is that a lift value close to 1 means the rules were completely independent. Lift values > 1 are generally more “interesting” and could be indicative of a useful rule pattern.

 

Apriori algorithm is based on conditional probabilities and helps us determine the likelihood of items being bought together based on a - priori data.





There are three important parameters -

support, confidence and lift.

Suppose there a set of transactions with item1 --> item 2. So, support for item 1 will be defined by n(item1) / n (total transactions). Confidence on the other hand is defined as, n (item1 & item2) / n(item1). So, confidence tells us the strength of the association and support tells us the relevance of the rule. Because we don’t want to include rules about items that are seldom bought, or in other words, have low support. Lift is Confidence/Support. Higher the lift, more the significance of applying the Apriori algorithm to determine the rule.

The figures below describe the process in more intuitive manner.


But, please note, Apriori Algorithm is highly time expensive.

More Resources to explore

§ Introduction to Market Basket Analysis in Python

§ Machine learning and Data Mining - Association Analysis with Python

§ Apriori Algorithm (Python 3.0)

§ Association rules and frequent itemset

§ Association Rules and the Apriori Algorithm: A Tutorial

§ How to Create Data Visualization for Association Rules in Data Mining


------------

Exercise:

------------

As for the practice for this week, you have to Association Mining Algorithms on this Kaggle Competition.

§ Association Rules Mining/Market Basket Analysis

Dr. Kotrappa Sirbi

Data Science Educator, Mentor, Trainer, and Research Development in Machine Learning, AI/DL and Software Engineering .

6 年

Machine Learning 9 : What about Machine Learning 1.8: Are they previous articles???

Manish Khodiar

Senior Data Architect at DataEconomy

6 年

Agree. Apriori algorithm uses multiple database scans. What I’ve learnt is that Frequent Pattern (FP) growth mining technique for such association analysis instead uses its own internal FP tree structure which results in less database scan and is way faster than Apriori.

要查看或添加评论,请登录

Shivam Panchal的更多文章

  • Best Resources for Data Science Enthusiasts- A Complete List

    Best Resources for Data Science Enthusiasts- A Complete List

    Free Books R Python Libraries Libraries for Python Libraries for R Complete Beginner Resources ML, DL and RL in Python…

  • Machine Learning, Deep Learning and Artificial Intelligence Resources for all

    Machine Learning, Deep Learning and Artificial Intelligence Resources for all

    Here is a bunch of machine learning resources, thought I'd share it here. ★ are resources that were highly recommended…

    1 条评论
  • Machine Learning 10: 'Recommendation System'

    Machine Learning 10: 'Recommendation System'

    Why do the we care about the Recommendation Systems? The answer to this question may be different based on different…

  • Machine Learning 8: 'Clustering Algorithms'

    Machine Learning 8: 'Clustering Algorithms'

    In the last week, we explored classification and Random Forest algorithm and that was a part of Supervised Machine…

    2 条评论
  • Machine Learning 7:'Classification' Day 3

    Machine Learning 7:'Classification' Day 3

    In the last post, I discussed about Decision Tree. In this post, I will be discussing about Random Forest Algorithm…

    9 条评论
  • Machine Learning 6:'Classification' Day 2

    Machine Learning 6:'Classification' Day 2

    Keep asking yes/no questions. With each question continue to significantly narrow down the space of possibly secrets.

    6 条评论
  • Machine Learning : 'Classification' - Day 1

    Machine Learning : 'Classification' - Day 1

    In this post, we are starting off the classification, firstly, we will get into the difference between classification…

    17 条评论
  • Machine Learning : 'Regression' - Day 4

    Machine Learning : 'Regression' - Day 4

    In this post which will be the last one on regression analysis, I will be discussing about the following topics in…

    3 条评论
  • Machine Learning : 'Regression' - Day 3

    Machine Learning : 'Regression' - Day 3

    In the last to last post, we discussed about what is Regression and in the last one, we talked about the assumptions or…

    9 条评论
  • Machine Learning : 'Regression' - Day 2

    Machine Learning : 'Regression' - Day 2

    Welcome to the post, I will not bore you much with the theory behind, I will try to put it as easy as possible. In this…

    3 条评论

社区洞察

其他会员也浏览了