Data Scientist TJO in Tokyo

Data science, statistics or machine learning in broken English

Can multivariate modeling predict taste of wine? Beyond human intuition and univariate reductionism


At a certain meetup on the other day, I talked about a brand-new relationship between taste of wine (i.e. professional tasting) and data science. This talk was inspired by a book "Wine Science: The Application of Science in Winemaking". Below is its Japanese edition.


新しいワインの科学

新しいワインの科学


For readers who can't read Japanese, I summarized the content of the talk in this post. Just for your information, I myself am a super wine lover :) and I'm also much interested in how data science explain taste of wine.


In order to run analytics below, I prepared an R workspace in my GitHub repository. Please download and import it into your R environment.



(Note: All quotes from Goode's book here are reversely translated from the Japanese edition and it may contain not a few difference from the original version)

Read more

Bayesian modeling with R and Stan (5): Time series with seasonality

In the previous post, we successfully estimated a model with a nonlinear trend by using Stan.


But please remember this is a time series dataset. Does it include any other kind of nonlinear components? Yes, we have to be careful for seasonality. Actually when I generate the dataset, I added a seasonal component with a 7 days cycle.


Then we have to change the model as follows.


CV_t = Q_t + \sum{trend_t} + season_{mod(t,7)}

trend_t - trend_{t-1} = trend_{t-1} - trend_{t-2} + \epsilon_t

\displaystyle \sum^{7}_k season_k \sim \cal{N} (0, \sigma_{season})

 Q_t = a x_{1t} + b x_{2t} + c x_{3t} + d + \epsilon_t

Read more

Bayesian modeling with R and Stan (4): Time series with a nonlinear trend

The previous post reviewed how to estimate a simple hierarchical Bayesian models. You can see more complicated cases in a great textbook "The BUGS book".


But personally hierarchical Bayesian modeling is the most useful for time-series analysis. I think {dlm} CRAN package is popular for such a purpose, but in order to run more complicated modeling Stan would be a powerful alternative.


To see a simple practice on a complicated time series analysis with Stan, first download a sample dataset from GitHub and import it as "d" to your RStudio workspace. It contains 3 independent variables (x1, x2 and x3) and 1 dependent variable (y). Here I assume y means a certain the number of conversion on a daily basis and x1-x3 mean daily amounts of budget for distinct ads. You can overview how they are just by plotting.

> par(mfrow=c(4,1),mar=c(1,6,1,1))
> plot(d$y,type='l',lwd=3,col='red')
> plot(d$x1,type='l',lwd=1.5)
> plot(d$x2,type='l',lwd=1.5)
> plot(d$x3,type='l',lwd=1.5)

f:id:TJO:20150817183557p:plain


It appears to include some nonlinear trend. Usual multiple linear regression cannot fit such a time series.

> d.lm<-lm(y~.,d)
> matplot(cbind(d$y,predict(d.lm,d[,-4])),type='l',lty=1,lwd=3,col=c(1,2))

f:id:TJO:20150818122448p:plain


In traditional econometrics, such a trend should be treated as, for example, unit root process or trend process. But here we tackle it with just a Bayesian modeling using Stan.

Read more