Subscribed unsubscribe Subscribe Subscribe

Data Scientist TJO in Tokyo

Data science, statistics or machine learning in broken English

Pitfall of "regression to the mean" in growth hacking

statistics business marketing

(The original post in Japanese version is here)


In several marketing teams that I've worked on or from not a few people in the other companies for marketing, I've heard some complaints as follows: "We're working hard to improve and optimize our services, and we saw really the KPI rose back. Indeed the KPI rose one day and decreased the other day. At any rate, we've worked much harder!!! But... why? Why does the KPI reach almost only a half of that a year ago???"


The problem is serious, because while you believe you've been worked out hard, the situation is never improved. In an extreme case, even though you've kept on working harder than ever, the KPI goes down.


f:id:TJO:20140121161954p:plain


Is it incredible? No, no, it's not rare. I've seen a lot of similar situations described above even though I know all people there have been struggled much harder. This experience made me believe that this is a systematic problem in any kinds of (digital) marketing. In general, just as a common sense, people believe "practice makes perfect" or "efforts will pay off". In the industry of web-services, agility may be added to the list before; if you want to succeed, you must improve immediately when you find any drop of KPIs.


However, econometrics tell us an astonishing possibility: an up-and-down behavior of the KPI that you've seen is just an illusion indifferent to any of your effort.


Illusion: the KPI rises when you improve and it drops when you don't do anything


Imagine your task is monitoring a KPI with some BI tools and growth hacking your web service. For example, every morning you share the latest measured data with colleagues, and if required, you decide how to improve your service and let colleagues to implement it. As usual, you and colleagues have a lot of choices to improve it and experiences.


One day you saw that the KPI dropped compared to the one yesterday as below.

f:id:TJO:20140121230853p:plain

Of course any improvement should be required in this situation, and indeed you did so immediately. Consequently the KPI rose and you were happy.

f:id:TJO:20140121231156p:plain

But while you did not do anything for a whole day, the KPI started to drop again. You had to do any action for improvement again!

f:id:TJO:20140121232047p:plain

You repeated the same action for a while every day... yes, I know you really worked hard. OK, let's see a whole picture of the KPI for 180 days.

f:id:TJO:20140121232523p:plain

OMG, it just fluctuated around a constant mean??? Although you worked hard every time you saw it dropped and really you saw it rose back every time??? This kind of phenomenon can be interpreted as "regression to the mean". Wikipedia describes as follows:

In statistics, regression toward (or to) the mean is the phenomenon that if a variable is extreme on its first measurement, it will tend to be closer to the average on its second measurement—and, paradoxically, if it is extreme on its second measurement, it will tend to have been closer to the average on its first.


This phenomenon often occurs even on marketing data, with more complicated fluctuation that cannot be interpreted as regression to the mean, as below. This figure shows a simulated time series actually regressing to the mean*1.

f:id:TJO:20140121234831p:plain

At any rate, if regression to the mean occurs like examples above, you have to conclude: YOUR ACTION FOR GROWTH HACKING NEVER WORKED. This is one of the worst pitfalls in growth hacking.


You have to focus on "the baseline" and/or "the global trend"


Regression to the mean indicates our growth hacking never works or has almost no effects on the service. If we found it occurs unfortunately on the service, what do we have to do? It's very simple and everybody can easily practice. First, you have to focus on the baseline and its behavior.


f:id:TJO:20140122164554p:plain


Really efficient growth hacking should raise up the baseline CONTINUOUSLY, not instantaneously. In this case, only 2 of your growth hacking really worked and raised up the baseline in a stepwise manner.


When an effect of your growth hacking works gradually but continuously, the KPI looks as below.


f:id:TJO:20140122165300p:plain


If the KPI shows this kind of uptrend, probably your growth hacking really worked. In cases above, you can conclude that your growth hacking really made sense and you can get confident in growth hacking for this web service.


Some people may say that this kind of phenomenon is too famous and everybody knows what it is. Why do I show it in this post? Because, in growth hacking, people love strong data management systems or BI tools which report KGIs and/or KPIs very fast in order to make a decision as soon as possible. But in such a situation, we can have the latest values of KPIs anytime we want -- and we are easy to respond it and to do action immediately for growth hacking. At that time, often we ignore any long-term perspective... because our attention is easy to captured by growth hacking at hand and we know it can really improve KPIs.


In this post, I showed a long-term strategy for avoiding such a pitfall; it's much simple but very important, for not only growth hacking but also improving operations in any other fields, even usual sales management.

*1:Indeed this is an ARIMA(1,0,1) time series