MVT – Why Full Factorial vs. Partial Factorial Misses the Entire Point

One of my first introductions to the larger world of testing was getting a chance to serve on a panel about Multivariate testing. I remember how divergent the opinions were and how bad the misconceptions were of the entire process. Just about everyone I talked to had these same common preconceived notion of how to use multivariate testing, and even worse almost all those notions were based on their need to propagate their sales pitches. Now as I work with more and more organizations, you see the same bad ideas replicating and groups continue to not understand the true value from multivariate testing. MVT testing is something that holds all these promises, but when done for the wrong reasons, multiplies the worst of testing, instead of facilitating the best of testing. Even worse, groups then confuse the issue, focusing on the method of the test, and not the fundamental mindset that created it. Many groups then get into debates around the “value” of the different multivariate methods out there, which is nothing more than a fools errand since any method is going to fail.

Too many times people get caught up on the “advantages” or “disadvantages” of the various forms of multivariate analysis. There are many advantages of full factorial testing, from fewer rules, better insight into interactions across tested elements, and the ability to test out non uniform concept arrays. There are many advantages to partial factorial testing, speed, forced conformity to better testing rules, more efficient use of resources. What does not matter is which one allows you to throw things at a wall and get an answer. When you are busy trying to answer the wrong question, then you can fail with any tool. It is only when you are trying to succeed that the differences between tools matter.

The fundamental use of multivariate testing for most groups is to combine multiple badly conceived A/B tests, so that they can quickly throw them all together so they can find a combination that increases results. So many groups want to try out this combination of ideas, so they think a MVT campaign is the solution. Fundamentally you can use the test that way, it is a both statistically a valid outcome and will guarantee a result, but at what cost? The challenge is that you will wasting resources, time, and are guaranteed to get a suboptimal outcome from this flawed way of thinking. Any form of multivariate testing that is just used as a massive collaboration of individual tests is always going to be inefficient, since you are replicating and adding the imperfections of those individual tests in a way that magnifies those imperfections. If your goal is simply that individual outcome, and it is for way too many programs and especially agencies, then you will never get any true value from multivariate testing until you change your mindset.

Fundamentally the concept of trying to just find a combination misses a fundamental truth, that you are spending a massive amount of resources, creating all these permutations and offers, without an understanding of the efficiency of each resources.

  1. All the ideas come from preconceptions and hypothesis about what does work
  2. The addition of all new variants adds cost in the creation and the data acquisition to be meaningful

If we instead focus on multivariate testing as a means to filter our resources instead of simply combine them, then we are able to achieve efficiency. If we try to limit our resources and only apply them where we will get the most return, then we must always via multivariate testing as a tool to learn and be efficient, not one to just throw things out to see what works.

The classic example of a multivariate test is testing a button. Let us say I have a medium orange purchase button currently on my site. I might think that red might be better than orange, and my UX person thinks that buy now will perform better because he saw it on a few other competitor sites. You throw it out by also adding a slightly larger button and you get a predicted best combination of large orange buy now. You slap yourself on the back, and you move forward. The reality is that each of those factors, size, color, copy have a massive amount of feasible alternatives, and all we did was look at a very limited biased set of them.

Let me propose a better way. Look at that same test, but instead of preconceiving the outcome, look for the value of each factor. If we took the same test, and we found out that size matters more than color, despite what you thought going in. If we spend as little resources as possible to achieve that understanding, then we have left the maximum amount of resources available to apply to the winning factor or element. If we have learned that size matters, we can shift our resources away from less influential elements and then apply the resources towards as many different feasible alternatives of the execution of the winning factor. Instead of being limited to testing 3-4 sizes, we can know the value of size and then create as many different alternatives as possible. Not only have we used less resources, but they have been applied towards the most influential part of our experience.

Even better, I now have learned that size matters most, and I have an outcome that is different and greater then I would have before. In fact I have shifted the system so that the absolute worst thing that can happen is that I end up with the same alternative I would have before, but for less time and resources. I have also added a much higher upside so that I can get a better outcome by having an alternative that I would not have previously included come out the winner. I have also tested out more alternatives of the important factor so that I am not limiting my output by the single input of popular opinion. I have leveraged multivariate testing as a way to learn what matters and to focus my future efforts on that. I no longer have to create alternatives for factors that have no influence, and can instead focus resources on testing as many different feasible alternatives I can for the things that do influence behavior.

The less you spend to reach a conclusion, the greater the ROI. The faster you move, the faster you can get to the next value as well, also increasing the outcome of your program. What is more important is to focus on the use of multivariate as a learning tool ONLY, one that was used to tell us where to apply resources. One that frees us up to test out as many resources for feasible alternatives on the most valuable or influential factor, while eliminating the equivalent waste on factors that do not have the same impact. The goal is to get the outcome, getting overly caught up in doing it in one massive step as opposed to smaller easier steps, is fool’s gold.

You CAN leverage multivariate tests in a large number of ways, and let me tell you that there are enough 15×8 tests out there to show that statistically, it is a statistically valid approach. The question is never what can you do, but what SHOULD you do. Just because I can test a massive amount of permutations does not mean that I am being efficient or getting the return on my efforts that I should. We can’t just ignore the context of the output to make you feel better about your results. You will get a result no matter what you do, the trick is constantly getting better results for fewer resources.

If you are stuck in the realm of trying to show results from a single test, or are not thinking in terms of your testing program as a learning optimization machine, then you aren’t going to get results you need no matter what you do. multivariate tests are useful only in the context of your program, if you are stuck thinking in terms of just the outcome of that specific test, you will never achieve the results that you want.

If you shift to think about it in context of a larger program, then multivariate tests are just one of many tools you have at your disposal to achieve those goals. Don’t let the promises and sales pitches of a few divert your attention away what matters. And if you are focusing on what matters, then the nature of which type of multivariate test you use becomes almost completely moot.