2015-11-27

Can multivariate modeling predict taste of wine? Beyond human intuition and univariate reductionism

Taste of Wine vs. Data Science from Takashi J OZAKI

At a certain meetup on the other day, I talked about a brand-new relationship between taste of wine (i.e. professional tasting) and data science. This talk was inspired by a book "Wine Science: The Application of Science in Winemaking". Below is its Japanese edition.

新しいワインの科学

作者: ジェイミーグッド,Jamie Goode,梶山あゆみ
出版社/メーカー: 河出書房新社
発売日: 2014/11/12
メディア: 単行本
この商品を含むブログを見る

For readers who can't read Japanese, I summarized the content of the talk in this post. Just for your information, I myself am a super wine lover :) and I'm also much interested in how data science explain taste of wine.

In order to run analytics below, I prepared an R workspace in my GitHub repository. Please download and import it into your R environment.

(Note: All quotes from Goode's book here are reversely translated from the Japanese edition and it may contain not a few difference from the original version)

2015-08-18

Bayesian modeling with R and Stan (5): Time series with seasonality

R statistics BUGS / Stan Bayesian

In the previous post, we successfully estimated a model with a nonlinear trend by using Stan.

But please remember this is a time series dataset. Does it include any other kind of nonlinear components? Yes, we have to be careful for seasonality. Actually when I generate the dataset, I added a seasonal component with a 7 days cycle.

Then we have to change the model as follows.

$CV_t = Q_t + \sum{trend_t} + season_{mod(t,7)}$

$trend_t - trend_{t-1} = trend_{t-1} - trend_{t-2} + \epsilon_t$

$\displaystyle \sum^{7}_k season_k \sim \cal{N} (0, \sigma_{season})$

$Q_t = a x_{1t} + b x_{2t} + c x_{3t} + d + \epsilon_t$

2015-08-18

Bayesian modeling with R and Stan (4): Time series with a nonlinear trend

R BUGS / Stan statistics Bayesian

The previous post reviewed how to estimate a simple hierarchical Bayesian models. You can see more complicated cases in a great textbook "The BUGS book".

But personally hierarchical Bayesian modeling is the most useful for time-series analysis. I think {dlm} CRAN package is popular for such a purpose, but in order to run more complicated modeling Stan would be a powerful alternative.

To see a simple practice on a complicated time series analysis with Stan, first download a sample dataset from GitHub and import it as "d" to your RStudio workspace. It contains 3 independent variables (x1, x2 and x3) and 1 dependent variable (y). Here I assume y means a certain the number of conversion on a daily basis and x1-x3 mean daily amounts of budget for distinct ads. You can overview how they are just by plotting.

> par(mfrow=c(4,1),mar=c(1,6,1,1))
> plot(d$y,type='l',lwd=3,col='red')
> plot(d$x1,type='l',lwd=1.5)
> plot(d$x2,type='l',lwd=1.5)
> plot(d$x3,type='l',lwd=1.5)

f:id:TJO:20150817183557p:plain

It appears to include some nonlinear trend. Usual multiple linear regression cannot fit such a time series.

> d.lm<-lm(y~.,d)
> matplot(cbind(d$y,predict(d.lm,d[,-4])),type='l',lty=1,lwd=3,col=c(1,2))

f:id:TJO:20150818122448p:plain

In traditional econometrics, such a trend should be treated as, for example, unit root process or trend process. But here we tackle it with just a Bayesian modeling using Stan.

Data Scientist TJO in Tokyo

Data science, statistics or machine learning in broken English

Can multivariate modeling predict taste of wine? Beyond human intuition and univariate reductionism

Bayesian modeling with R and Stan (5): Time series with seasonality

Bayesian modeling with R and Stan (4): Time series with a nonlinear trend