BxD Primer Series: Apriori Pattern Search Algorithm

BxD Primer Series: Apriori Pattern Search Algorithm

Hey there ??

Welcome to BxD Primer Series where we are covering topics such as Machine learning models, Neural Nets, GPT, Ensemble models, Hyper-automation in ‘one-post-one-topic’ format. Today’s post is on?Apriori Pattern Search Algorithm. Let’s get started:

The What:

Apriori is an older generation algorithm to ECLAT and FP-Growth for mining frequent itemset and association rules in transactions database. It uses "apriori property" to prune the search space of candidate itemsets, is able to handle large datasets.

Apriori is a property of frequent itemsets that states -?if an itemset is frequent, then all of its subsets must also be frequent. This property is used in the Apriori algorithm to prune search space of candidate itemsets, since any itemset that contains a subset that is not frequent cannot itself be frequent. This reduces the number of candidate itemsets that need to be generated and checked, making the algorithm efficient. It is widely used in fields such as market basket analysis, recommendation systems, and anomaly detection.

Compared to the ECLAT algorithm, Apriori is often slower due to its candidate generation step, but is more flexible in allowing user-specified minimum support thresholds and generating association rules.

The How:

Apriori algorithm utilizes a breadth-first search strategy and a vertical representation of transaction database to efficiently mine frequent itemsets and association rules from large datasets. It needs support and confidence thresholds as input.

Breadth-first: Breadth-first approach is a search strategy where all nodes at a given level are visited before moving on to the next level. This is in contrast to breadth-first approach, which explores the deepest path of a graph or tree before backtracking to explore other paths from same level.

Horizontal Data Layout: Refers to the representation, where each row corresponds to a transaction and each column corresponds to an item in the itemset. This is in contrast to the vertical data layout, where each row corresponds to an item and each column corresponds to a transaction.

No alt text provided for this image
No alt text provided for this image

Now, here is how Apriori Algorithm finds “frequent itemsets” and “association rules”:

  1. Start by counting the support of each individual item in the dataset, i.e., the number of transactions that contain the item. Items that meet the?minimum support threshold?are considered frequent 1-itemsets.
  2. Generate candidate 2-itemsets by joining pairs of frequent 1-itemsets. For example, if {A} and {B} are frequent 1-itemsets, then {A, B} is a candidate 2-itemset. The support of each candidate 2-itemset is then counted.
  3. Prune the candidate 2-itemsets that do not meet the?minimum support threshold, leaving only the frequent 2-itemsets.
  4. Generate candidate 3-itemsets by joining pairs of frequent 2-itemsets that share the same prefix. For example, if {A, B} and {A, C} are frequent 2-itemsets, then {A, B, C} is a candidate 3-itemset. The support of each candidate 3-itemset is then counted.
  5. Prune the candidate 3-itemsets that do not meet the?minimum support threshold, leaving only the frequent 3-itemsets.
  6. Continue this process?until no more frequent itemsets?can be found.
  7. Use the frequent itemsets to?generate association rules. An association rule is a statement of the form "if A, then B", where A and B are sets of items. The support of the rule is the fraction of transactions that contain both A and B, and the confidence of the rule is the fraction of transactions that contain B among those that contain A.
  8. Prune association rules that do not meet the?minimum support and confidence thresholds.

The eventual output of the Apriori algorithm is a list of frequent itemsets and their corresponding support values, as well as a list of association rules and their corresponding confidence values.

Selecting Minimum Support and Confidence Thresholds:

Minimum Support and Confidence Thresholds are numbers between 0 and 1, usually expressed as percentage. They are decided using both visual and qualitative approaches. The approaches are already covered it on ECLAT algorithm edition (check?here).

The Why:

Some reasons to use Apriori pattern search algorithm:

  1. Apriori uses a breadth-first approach, which guarantees convergence to correct solution.
  2. It can handle sparse datasets efficiently, which is a strength of the horizontal representation.
  3. It is easy to implement and understand. Resources are readily available as it is an old algorithm.
  4. It can handle datasets with a larger number of transactions as compared to algorithms that use vertical representation.

The Why-Not:

Some reasons to not use Apriori pattern search algorithm:

  1. Apriori explores many irrelevant paths before finding the desired solution, which is a disadvantage of the breadth-first approach.
  2. It can be slower than algorithms using depth-first approach.
  3. Becomes computationally expensive if there are large number of items and association rules.
  4. Apriori algorithm is not be effective at finding frequent itemsets with low support, as it requires scanning entire dataset multiple times to count support of each candidate itemset.

Time for you to support:

  1. Reply to this article with your question
  2. Forward/Share to a friend who can benefit from this
  3. Chat on Substack with BxD (here)
  4. Engage with BxD on LinkedIN (here)

In next coming posts, we will cover one more pattern search model: FP-Growth.

Post that, we will start with dimensionality reduction models such as PCA, LSA, SVD, LDA, t-SNE.

Let us know your feedback!

Until then,

Have a great time! ??

#businessxdata?#bxd?#apriori?#pattern?#search?#primer

要查看或添加评论,请登录

Mayank K.的更多文章

  • What we look for in new recruits?

    What we look for in new recruits?

    Personalization is the #1 use case of most of AI technology (including Generative AI, Knowledge Graphs…

  • 500+ Enrollments, ?????????? Ratings and a Podcast

    500+ Enrollments, ?????????? Ratings and a Podcast

    We are all in for AI Driven Marketing Personalization. This is the niche where we want to build this business.

  • What you mean 'Build A Business'?

    What you mean 'Build A Business'?

    We are all in for AI Driven Personalization in Business. This is the niche where we want to build this business.

  • Why 'AI-Driven Personalization' niche?

    Why 'AI-Driven Personalization' niche?

    We are all in for AI Driven Personalization in Business. In fact, this is the niche where we want to build this…

  • Entering the next chapter of BxD

    Entering the next chapter of BxD

    We are all in for AI Driven Personalization in Business. And recently we created a course about it.

    1 条评论
  • We are ranking #1

    We are ranking #1

    We are all in for AI Driven Personalization in Business. And recently we created a course about it.

  • My favorites from the new release

    My favorites from the new release

    The Full version of BxD newsletter has a new home. Subscribe on LinkedIn: ?? https://www.

  • Many senior level jobs inside....

    Many senior level jobs inside....

    Hi friend - As you know, we recently completed 100 editions of this newsletter and I was the primary publisher so far…

  • People need more jobs and videos.

    People need more jobs and videos.

    From the 100th edition celebration survey conducted last week- one point is standing out that people need more jobs and…

  • BxD Saturday Letter #202425

    BxD Saturday Letter #202425

    Please take 2 mins to send your feedback. Link: https://forms.

社区洞察

其他会员也浏览了