# Can multivariate modeling predict taste of wine? Beyond human intuition and univariate reductionism

At a certain meetup on the other day, I talked about a brand-new relationship between taste of wine (i.e. professional tasting) and data science. This talk was inspired by a book "Wine Science: The Application of Science in Winemaking". Below is its Japanese edition.

For readers who can't read Japanese, I summarized the content of the talk in this post. Just for your information, I myself am a super wine lover :) and I'm also much interested in how data science explain taste of wine.

In order to run analytics below, I prepared an R workspace in my GitHub repository. Please download and import it into your R environment.

(Note: All quotes from Goode's book here are reversely translated from the Japanese edition and it may contain not a few difference from the original version)

# Bayesian modeling with R and Stan (5): Time series with seasonality

In the previous post, we successfully estimated a model with a nonlinear trend by using Stan.

But please remember this is a time series dataset. Does it include any other kind of nonlinear components? Yes, we have to be careful for seasonality. Actually when I generate the dataset, I added a seasonal component with a 7 days cycle.

Then we have to change the model as follows.

# Bayesian modeling with R and Stan (4): Time series with a nonlinear trend

The previous post reviewed how to estimate a simple hierarchical Bayesian models. You can see more complicated cases in a great textbook "The BUGS book".

But personally hierarchical Bayesian modeling is the most useful for time-series analysis. I think {dlm} CRAN package is popular for such a purpose, but in order to run more complicated modeling Stan would be a powerful alternative.

To see a simple practice on a complicated time series analysis with Stan, first download a sample dataset from GitHub and import it as "d" to your RStudio workspace. It contains 3 independent variables (x1, x2 and x3) and 1 dependent variable (y). Here I assume y means a certain the number of conversion on a daily basis and x1-x3 mean daily amounts of budget for distinct ads. You can overview how they are just by plotting.

> par(mfrow=c(4,1),mar=c(1,6,1,1))
> plot(d$y,type='l',lwd=3,col='red') > plot(d$x1,type='l',lwd=1.5)
> plot(d$x2,type='l',lwd=1.5) > plot(d$x3,type='l',lwd=1.5)


It appears to include some nonlinear trend. Usual multiple linear regression cannot fit such a time series.

> d.lm<-lm(y~.,d)
> matplot(cbind(d\$y,predict(d.lm,d[,-4])),type='l',lty=1,lwd=3,col=c(1,2))


In traditional econometrics, such a trend should be treated as, for example, unit root process or trend process. But here we tackle it with just a Bayesian modeling using Stan.