2014-02-06

(The original posts in Japanese version are here and here )

In Japan, from my own experience, there may be a dichotomy between "analytics" and "data science". It has been said that real business matters require rapid analyses and rapid actions so that people usually like simple and rapid analytic works rather than data science works which are time-consuming and need a lot of expertise. Consequently not a few companies like to hire "analysts" as analytic experts and to let them to run a rapid analysis on each business project.

For example, some previous colleagues loved such a kind of simple analytics that merely describes which UI component should be good for KPIs. Imagine you have to set an order of priority on UI/UX components of a web service and now you have a data frame of a conversion (CV) flag and UI component flags with 0/1 values as below.

a1	a2	a3	a4	a5	a6	a7	cv
1	1	1	0	1	1	0	Yes
0	1	0	1	0	0	0	No
0	0	0	1	1	1	0	Yes
1	0	0	1	1	1	0	Yes
0	0	1	1	0	0	1	No
...	...	...	...	...	...	...	...

Simple analytics lovers often compute and conclude as below.

a1	a2	a3	a4	a5	a6	a7	CV
40.1%	58.3%	47.9%	94.2%	30.7%	5.6%	50.0%	No
60.5%	41.7%	49.4%	43.6%	68.4%	92.7%	49.3%	Yes
20.3%	-16.6%	1.5%	50.6%	37.7%	87.1%	-0.7%	Yes - No

"a1, a3, a5 and a6 increase CV (because they're positive), but a2, a4 and a7 decrease CV (because negative), as a priority order"

This is a result from a very very simple analytics: they just compute a ratio (percentage) of a flag corresponding to either Yes or No of each explanatory variable, and show you. Yes, it looks somewhat plausible... but is it really OK?

Data Scientist TJO in Tokyo

Data science, statistics or machine learning in broken English

Simple analytics work fast, but cannot avoid third-party effects

a1	a2	a3	a4	a5	a6	a7	cv
1	1	1	0	1	1	0	Yes
0	1	0	1	0	0	0	No
0	0	0	1	1	1	0	Yes
1	0	0	1	1	1	0	Yes
0	0	1	1	0	0	1	No
...	...	...	...	...	...	...	...

a1	a2	a3	a4	a5	a6	a7	cv
1	1	1	0	1	1	0	Yes
0	1	0	1	0	0	0	No
0	0	0	1	1	1	0	Yes
1	0	0	1	1	1	0	Yes
0	0	1	1	0	0	1	No
...	...	...	...	...	...	...	...

a1	a2	a3	a4	a5	a6	a7	cv
1	1	1	0	1	1	0	Yes
0	1	0	1	0	0	0	No
0	0	0	1	1	1	0	Yes
1	0	0	1	1	1	0	Yes
0	0	1	1	0	0	1	No
...	...	...	...	...	...	...	...