Understanding Market Basket Analysis

Market basket analysis a machine learning approach that attempts to find relationships among a group of items in a data set. For example, a famous use of this method was when retailers discovered an association between beer and diapers.

Upon closer examination, the retailers found that when men came to purchase diapers for their babies they would often buy beer in the same trip. With this knowledge, the retailers placed beer and diapers next to each other in the store and this further increased sales.

In addition, many of the recommendation systems we experience when shopping online use market basket analysis results to suggest additional products to us. As such, market basket analysis is an intimate part of our lives with us even knowing.

In this post, we will look at some of the details of market basket analysis such as association rules, apriori, and the role of support and confidence.

Association Rules

The heart of market basket analysis are association rules. Association rules explain patterns of relationship among items. Below is an example

{rice, seaweed} -> {soy sauce}

Everything in curly braces { } is an itemset, which is some form of data that occurs often in the dataset based on criteria. Rice and seaweed are our itemset on the left and soy sauce is our itemset on the right. The arrow -> indicates what comes first as we read from left to right. If we put this association rule in simple English it would say “if someone buys rice and seaweed then they will buy soy sauce”.

The practical application of this rule is to place rice, seaweed and soy sauce near each other in order to reinforce this rule when people come to shop.

The Algorithm

Market basket analysis uses an apriori algorithm. This algorithm is useful for unsupervised learning that does not require any training and thus no predictions. The Apriori algorithm is especially useful with large datasets but it employs simple procedures to find useful relationships among the items.

The shortcut that this algorithm uses is the “apriori property” which states that all suggsets of a frequent itemset must also be frequent. What this means in simple English is that the items in an itemset need to be common in the overall dataset. This simple rule saves a tremendous amount of computational time.

Support and Confidence

Two key pieces of information that can further refine the work of the Apriori algorithm is support and confidence. Support is a measure of the frequency of an itemset ranging from 0 (no support) to 1 (highest support). High support indicates the importance of the itemset in the data and contributes to the itemset being used to generate association rule(s).

Returning to our rice, seaweed, and soy sauce example. We can say that the support for soy sauce is 0.4. This means that soy sauce appears in 40% of the purchases in the dataset which is pretty high.

Confidence is a measure of the accuracy of an association rule which is measured from 0 to 1. The higher the confidence the more accurate the association rule. If we say that our rice, seaweed, and soy sauce rule has a confidence of 0.8 we are saying that when rice and seaweed are purchased together, 80% of the time soy sauce is purchased as well.

Support and confidence can be used to influence the apriori algorithm by setting cutoff values to be searched for. For example, if we set a minimum support of 0.5 and a confidence of 0.65 we are telling the computer to only report to us association rules that are above these cutoff points. This helps to remove useless rules that are obvious or useless.

Conclusion

Market basket analysis is a useful tool for mining information from large datasets. The rules are easy to understanding. In addition, market basket analysis can be used in many fields beyond shopping and can include relationships within DNA, and other forms of human behavior. As such, care must be made so that unsound conclusions are not drawn from random patterns in the data

1 thought on “Understanding Market Basket Analysis

  1. Pingback: Understanding Market Basket Analysis | Educatio...

Leave a Reply