Identifying rules that describe specific patterns within “data” is an essential skill. To understand the importance of pattern discovery let’s first explore what are “patterns.”
Patterns are a set of items, subsequences or substructures that occur frequently together in data sets. We call these strongly correlated. Patterns usually represent intrinsic and important properties of data.
Pattern discovery is a process which attempts to uncover and mine patterns from massive data sets. For example;
- You may want to understand kind of products are often purchased together
- You may want to understand unexpected associations
- You may want to understand the sequences of warnings that precede an equipment failure to schedule preventative maintenance
Pattern mining forms the foundation for many things. For example, associating correlation causality analysis, mining sequential structure patterns, pattern analysis in spatiotemporal data, multimedia data and stream data.
Even for classification, if we use discriminative pattern-based analysis, the classification could be more accurate. And for cluster analysis, pattern-based subspace clustering could be an important direction for cluster analysis.
Let’s look at the Frequent patterns and associations rules. For example you have five transactions:
Transaction A: Eggs, bread, watermelon, beer
Transaction B: Beer, peanuts, bread
Transaction C: Diapers, wipes, apple sauce
Transaction D: Beer, bread, butter, toilet paper
Transaction E: Bread, cheese, apples
Transaction A contains eggs, bread, watermelon, and beer, which form an item set because this is a, a set of items. And for this particular one, it is four item set because it contains four items. And for each item set, you may have a concept of support. Support means, in these transactions data set, how many times does “beer” happen? In our example, there are three occurrences of beer out of five transactions. So the relative support is 3 over 5, or you can say 60%.
So, we may see whether in item set X is frequent or not. If X, the support of X, pass a minimum support threshold. For example, if we said the minimum support threshold is 50%. Then, we can see the frequent 1-itemset, in this data set, you will find there are 4, like, beer, you can see there are, 3 cases, the absolute support is 3, the relative support is 3 over 5 is 60%. But you will also note that in all transactions with “beer” we also find “bread”. Can assume that people who buy beer also buy bread?
More about Association Rule in next blog…