Most Winning A/B Test Results are Illusory

Whitepaper about errors in A/B testing, written for Qubit.

Covered at qz.com and Hacker News

Introduction

Marketers have begun to question the value of A/B testing, asking: ‘Where is my 20% uplift? Why doesn’t it ever seem to appear in the bottom line?’ Their A/B test reports an uplift of 20% and yet this increase never seems to translate into increased profits. So what’s going on?

In this article I’ll show that badly performed A/B tests can produce winning results which are more likely to be false than true. At best, this leads to the needless modification of websites; at worst, to modification which damages profits.

Statisticians have known for almost a hundred years how to ensure that experimenters don’t get misled by their experiments [1]. Using this methodology has resulted in pharmaceutical drugs that work, in bridges that stay up and in the knowledge that smoking causes cancer. I’ll show how these methods ensure equally robust results when applied to A/B testing.

To do this, I’ll introduce three simple concepts which come as second nature to statisticians but have been forgotten by many web A/B testing specialists. The names of these concepts are ‘statistical power’, ‘multiple testing’ and ‘regression to the mean’. Armed with them, you will be able to cut through the misinformation and confusion that plague this industry.

Read the rest here: