Can multivariate modeling predict taste of wine? Beyond human intuition and univariate reductionism
At a certain meetup on the other day, I talked about a brand-new relationship between taste of wine (i.e. professional tasting) and data science. This talk was inspired by a book "Wine Science: The Application of Science in Winemaking". Below is its Japanese edition.
- 作者: ジェイミーグッド,Jamie Goode,梶山あゆみ
- 出版社/メーカー: 河出書房新社
- 発売日: 2014/11/12
- メディア: 単行本
- この商品を含むブログを見る
For readers who can't read Japanese, I summarized the content of the talk in this post. Just for your information, I myself am a super wine lover :) and I'm also much interested in how data science explain taste of wine.
In order to run analytics below, I prepared an R workspace in my GitHub repository. Please download and import it into your R environment.
(Note: All quotes from Goode's book here are reversely translated from the Japanese edition and it may contain not a few difference from the original version)
Read moreBayesian modeling with R and Stan (5): Time series with seasonality
In the previous post, we successfully estimated a model with a nonlinear trend by using Stan.
But please remember this is a time series dataset. Does it include any other kind of nonlinear components? Yes, we have to be careful for seasonality. Actually when I generate the dataset, I added a seasonal component with a 7 days cycle.
Then we have to change the model as follows.
Read more
Bayesian modeling with R and Stan (4): Time series with a nonlinear trend
The previous post reviewed how to estimate a simple hierarchical Bayesian models. You can see more complicated cases in a great textbook "The BUGS book".
But personally hierarchical Bayesian modeling is the most useful for time-series analysis. I think {dlm} CRAN package is popular for such a purpose, but in order to run more complicated modeling Stan would be a powerful alternative.
To see a simple practice on a complicated time series analysis with Stan, first download a sample dataset from GitHub and import it as "d" to your RStudio workspace. It contains 3 independent variables (x1, x2 and x3) and 1 dependent variable (y). Here I assume y means a certain the number of conversion on a daily basis and x1-x3 mean daily amounts of budget for distinct ads. You can overview how they are just by plotting.
> par(mfrow=c(4,1),mar=c(1,6,1,1)) > plot(d$y,type='l',lwd=3,col='red') > plot(d$x1,type='l',lwd=1.5) > plot(d$x2,type='l',lwd=1.5) > plot(d$x3,type='l',lwd=1.5)
It appears to include some nonlinear trend. Usual multiple linear regression cannot fit such a time series.
> d.lm<-lm(y~.,d) > matplot(cbind(d$y,predict(d.lm,d[,-4])),type='l',lty=1,lwd=3,col=c(1,2))
In traditional econometrics, such a trend should be treated as, for example, unit root process or trend process. But here we tackle it with just a Bayesian modeling using Stan.
Read more