Traditionally speaking and in many areas of modern science, one of the biggest obstacles in any experiment or analysis is obtaining a large enough data set to fulfill the standards of sample size required by most statistical procedures. In more recent years and especially in the world of digital marketing, this is far from the case. Many clients are inundated with data – so much, in fact, that it can sometimes be difficult to know what to look for or where to start. With the computational costs associated with truly large data sets (gigs upon gigs and terabytes upon terabytes) we want to be strategic in the way we examine the data.
If you’re not entirely sure what to look for, a great place to begin is with an Association & Affinity analysis. An Association & Affinity analysis is an extremely flexible type of analysis that allows us to use basic data mining techniques to establish relationships between site metrics. Traditional marketing pros might know this type of analysis better as “market basket” analysis. This term comes from the simple example of buying items at a grocery store. With transaction level data, we can determine which items are most likely to be purchase together. For example, if we know that 71% of transactions that include milk also include bread, then we can start to make some tactical decisions about when we put those two items on sale, where we physically place the items in the store, etc.
A customer contacted Adobe Consulting to ask for help in determining whether or not a customer’s origin could help predict the location to which they’re traveling. Additionally, they were interested in knowing if there were any seasonal or brand factors to take into consideration. What we found was striking, surprising, and significant.
With the help of the Insight Consulting team, we looked at several years of transaction level data, which amounted to a little over 10 million transactions. Needless to say, this is a little more than Excel is able to handle, and a perfect opportunity to use some simple, but very useful data mining techniques. One of the great qualities of Insight is its ability to handle large data sets, so we used Insight to do all the number crunching. As mentioned previously, there were several variables that we took into consideration for our analysis:
- Origin – the place a customer was physically located when they made the rental reservation
- Destination – the place a customer picked up the car for the rental reservation
- Brand – this particular travel agency was a conglomerate of several brands, which functioned separately, but were still part of the same company
- Date – the day, month, and year for the time of travel
We incorporated three distinct statistical measures of association: support, confidence, and lift ratio. With these variables, we created a dashboard that allowed analysts on the client side to interact with and understand the analysis. The end result is set of tables and heat maps that identify the origin-destination pairs that are most likely to occur together, as seen below.~~~~