BxD Primer Series: Apriori Pattern Search Algorithm
Hey there ??
Welcome to BxD Primer Series where we are covering topics such as Machine learning models, Neural Nets, GPT, Ensemble models, Hyper-automation in ‘one-post-one-topic’ format. Today’s post is on?Apriori Pattern Search Algorithm. Let’s get started:
The What:
Apriori is an older generation algorithm to ECLAT and FP-Growth for mining frequent itemset and association rules in transactions database. It uses "apriori property" to prune the search space of candidate itemsets, is able to handle large datasets.
Apriori is a property of frequent itemsets that states -?if an itemset is frequent, then all of its subsets must also be frequent. This property is used in the Apriori algorithm to prune search space of candidate itemsets, since any itemset that contains a subset that is not frequent cannot itself be frequent. This reduces the number of candidate itemsets that need to be generated and checked, making the algorithm efficient. It is widely used in fields such as market basket analysis, recommendation systems, and anomaly detection.
Compared to the ECLAT algorithm, Apriori is often slower due to its candidate generation step, but is more flexible in allowing user-specified minimum support thresholds and generating association rules.
The How:
Apriori algorithm utilizes a breadth-first search strategy and a vertical representation of transaction database to efficiently mine frequent itemsets and association rules from large datasets. It needs support and confidence thresholds as input.
Breadth-first: Breadth-first approach is a search strategy where all nodes at a given level are visited before moving on to the next level. This is in contrast to breadth-first approach, which explores the deepest path of a graph or tree before backtracking to explore other paths from same level.
Horizontal Data Layout: Refers to the representation, where each row corresponds to a transaction and each column corresponds to an item in the itemset. This is in contrast to the vertical data layout, where each row corresponds to an item and each column corresponds to a transaction.
Now, here is how Apriori Algorithm finds “frequent itemsets” and “association rules”:
The eventual output of the Apriori algorithm is a list of frequent itemsets and their corresponding support values, as well as a list of association rules and their corresponding confidence values.
领英推荐
Selecting Minimum Support and Confidence Thresholds:
Minimum Support and Confidence Thresholds are numbers between 0 and 1, usually expressed as percentage. They are decided using both visual and qualitative approaches. The approaches are already covered it on ECLAT algorithm edition (check?here).
The Why:
Some reasons to use Apriori pattern search algorithm:
The Why-Not:
Some reasons to not use Apriori pattern search algorithm:
Time for you to support:
In next coming posts, we will cover one more pattern search model: FP-Growth.
Post that, we will start with dimensionality reduction models such as PCA, LSA, SVD, LDA, t-SNE.
Let us know your feedback!
Until then,
Have a great time! ??