The Dark Side of A/B Testing: Don’t Make These Two Mistakes!
by Matt Belkin
posted on 11-02-2006
All right folks, gather around…get nice and close…I want to tell you a cautionary tale about A/B testing. That’s right, the old standby of web analytics. A/B testing, or “split-run testing” as it is sometimes called, is one of the most pervasive and widely used methodologies behind web site improvement. And rightfully so – the concept is simple.
Say, for example, you want to test a new version of your Home Page. Well, direct some of your traffic to your current page (the A page) and some to your new page (the B page)…look at the differences in performance…and voila, you have your winner! Did the new page B outperform page A? If so, great – let’s direct all of our traffic to that page and watch our sales go through the roof. Or perhaps page B didn’t fare so well – so scrap it, and try again. Easy.
I would estimate this methodology has driven hundreds of millions, if not billions, in incremental revenue gains for Web sites. So why stop now? It’s powerful, simple, and accurate….or is it? While I’d agree that A/B testing is powerful and simple, there is a dark side to A/B testing that if unchecked, can significantly impact your revenue gains, or even (shudder) produce losses. What is this dark side? Read on and I’ll tell you a story about A/B testing gone wrong.
THE FIRST MISTAKE
Once upon a time there was large retailer that wanted to redesign their Home Page. As KPIs, they used Home Page Conversion Rate and Revenue per Home Page Visit to measure the success of their Home Page.
Before the test, the page had a Home Page Conversion rate of 3.0% and Revenue per Home Page visit of $20. Most of the Home Page consisted of big images of each product category, and not much else. The Marketing group wanted more room for special offers and the current design gave them none. Furthermore, as they continued adding categories, the Home Page was becoming long and unwieldy. The design team felt it was difficult for customers to navigate. So they created a new Home Page with much more room for promotions, and eliminated some of the less popular categories. They were pretty pleased with the design and knew the next step was to A/B test it.
They decided upon a 90/10 split test – with only 10% of traffic going to this new Home Page – so they could minimize risk if it should perform poorly (note: While many folks think of A/B testing as a 50%/50% split of traffic; an industry best practices is to expose only a fraction of your traffic to the new design � like 10% – so as to minimize risk).
Within the first 24 hours, the results were encouraging. The new page was showing a 4.0% Home Page Conversion Rate, and Revenue per Home Page visit of $25. Within the first week, the performance held up.
They began talking about what they would change next. The Marketing team felt the A/B test clearly demonstrated the effectiveness of their special offers. The Design team felt the A/B test demonstrated the effectiveness of fewer categories. They argued back and forth, but the data didn’t support either point. Rather, it supported both points. In other words, the test was inconclusive.
So what happened? Why did the test fail? Well, it failed because they changed more than one element on a page. When you conduct an A/B test, the best practice is to change only one element on the page. Changing more than one element in an A/B test makes it impossible to determine which change drove better performance and which did not. And while you might be tempted to just embrace all the changes since they led to an overall positive result; that is a short-sighted way to make a decision of such importance to your revenue stream. Why? Because it will not advance your understanding of your customers or their behavior, and will not keep you from repeating mistakes in the future.
Yes, it takes discipline to conduct your A/B tests this way. But it pays off in solid, actionable business intelligence that helps you improve your overall success.
Sidenote: If you want to test multiple elements at the same time, you need to conduct multi-variate testing (MVT) rather than A/B testing. Most multi-element or multi-variate tests introduce both positive and negative performers. It’s the nature of the beast. This is one reason why multi-variate testing – or MVT – is so powerful. Because it provides a sense of which elements perform better and which perform worse, it helps you continuously improve your performance. But it does take a lot longer to set up than A/B tests, and usually requires specialized help to do it right. If you’re interested in MVT, Omniture SiteCatalyst integrates with several major MVT vendors – please feel free to send us an email via the Contact link in the footer if you’d like more information.
Bottom line: when you are doing A/B testing, discipline your team to focus on one element, one test…remember that: one element, one test…and you can avoid costly mistakes like the one I just highlighted.
THE SECOND MISTAKE
Now most industry veterans will readily acknowledge the “one element, one test” mantra…that’s not earth-shattering and new. But what far fewer know, and what many more people stumble over, is this second mistake. To illustrate, let’s continue with our hapless retailer.
The retailer designs a new Home Page, changing only one element on the page. Good so far. But after the first 24 hours, the results are discouraging. Conversion has dropped to 2.0% and revenue per Home Page visit has dropped to $10. They decide to wait it out, but after 7 days, the results are still discouraging. They yank the page and conclude the new design was flawed.
While this all sounds pretty logical, it’s not. When running an A/B test, it’s critical to dig below your top-level KPIs – such as conversion and revenue per visit – and examine the visitor mix. New visitors typically respond much more favorably to a new page design than repeat visitors. Loyal visitors tend to be the worst. This actually shouldn’t be surprising to anyone. Imagine that your local grocery store suddenly moved the deli from the left front side of the store to the back right. If it were your first visit, you wouldn’t know the difference and would orient yourself appropriately. But if you had been a frequent customer – say shopping there twice a week – you would be pretty confused. In fact, after going to the left front side of the store – you might be so frustrated you leave. I know that sounds extreme, but in an online marketplace, it�s much easier to shop elsewhere than in the material world.
This is not to say you shouldn’t redesign the store, or your site. If you have good reasons to move the deli (like you are adding a new high-margin DVD-rental kiosk where the deli once was), then by all means move it. Just realize that it may take an extended period of time for your most loyal visitors to warm up to it.
Taking a step back, there is arguably an inverse relationship between visitor value and positive response to alternative page design. This is particularly the case when you consider visitor value as not just a function of lifetime revenue, but a function of Recency and Frequency as well. In other words, taking the Direct Marketing concept of a standard RFM model, your upper quartile (those with the highest scores in Recency, Frequency, and Monetary values) will quite possibly have the worst reaction to your new design.
So what does all this mean to you? Well, first off, dive deeply into those A/B test results, and use multi-dimensional segmentation to help you understand exactly who is reacting well to your changes, and who hates them. Segment by new and repeat visitors and analyze the performance of each group. Better yet, segment performance by RFM quartiles. It’s really the only way to know if you’re hitting the mark with your targeted customers. If you�re new to RFM, some key starting metrics for RFM would be �Days Since Last Purchase or Visit�; Visits within a period time; and Order Value/Revenues.
Frankly, this is where a product like Omniture Discover can be your best friend because multi-dimensional segmentation is so critical to your analysis. You just can’t drill into the data using traditional web analytics tools – they simply aren’t built to support the multiple levels you need. You need a way to see all visitors -> visitors to Home Page -> visitors to Home Page who saw page B -> visitors to Home Page who saw page B and ordered -> visitors to Home Page who saw page B, ordered, and were new -> visitors to Home Page who saw page B, didn’t order, and were new, etc.
Armed with that kind of deep-level insight, you can’t help but make all the right decisions — some of which may have seemed completely counter-intuitive at first. (The surprise is half the fun!)
THE MORAL OF THE STORY
Hopefully this little parable helped expand your understanding of A/B testing. My goal is to empower you to maximize your revenue potential and avoid the bad decisions that can result from using only high-level data. If you will discipline yourself to change and test only one element at a time, and will take the time to analyze your results segment by segment, you will avoid the two most common, and most costly, errors people make in A/B testing.
If you’d like assistance setting up or analyzing your A/B tests, feel free to contact the Omniture Best Practices Group – we’re more than happy to help!
What to chat about this topic? Discuss your own experiences with A/B and Multi-variate testing with your peers in our Website Conversion and Optimization forum.