Subscribed unsubscribe Subscribe Subscribe

Data Scientist TJO in Tokyo

Data science, statistics or machine learning in broken English

Can multivariate modeling predict taste of wine? Beyond human intuition and univariate reductionism

At a certain meetup on the other day, I talked about a brand-new relationship between taste of wine (i.e. professional tasting) and data science. This talk was inspired by a book "Wine Science: The Application of Science in Winemaking". Below is its Japanese edition.



For readers who can't read Japanese, I summarized the content of the talk in this post. Just for your information, I myself am a super wine lover :) and I'm also much interested in how data science explain taste of wine.

In order to run analytics below, I prepared an R workspace in my GitHub repository. Please download and import it into your R environment.

(Note: All quotes from Goode's book here are reversely translated from the Japanese edition and it may contain not a few difference from the original version)

Read more

Bayesian modeling with R and Stan (5): Time series with seasonality

In the previous post, we successfully estimated a model with a nonlinear trend by using Stan.

But please remember this is a time series dataset. Does it include any other kind of nonlinear components? Yes, we have to be careful for seasonality. Actually when I generate the dataset, I added a seasonal component with a 7 days cycle.

Then we have to change the model as follows.

CV_t = Q_t + \sum{trend_t} + season_{mod(t,7)}

trend_t - trend_{t-1} = trend_{t-1} - trend_{t-2} + \epsilon_t

\displaystyle \sum^{7}_k season_k \sim \cal{N} (0, \sigma_{season})

 Q_t = a x_{1t} + b x_{2t} + c x_{3t} + d + \epsilon_t

Read more

Bayesian modeling with R and Stan (4): Time series with a nonlinear trend

The previous post reviewed how to estimate a simple hierarchical Bayesian models. You can see more complicated cases in a great textbook "The BUGS book".

But personally hierarchical Bayesian modeling is the most useful for time-series analysis. I think {dlm} CRAN package is popular for such a purpose, but in order to run more complicated modeling Stan would be a powerful alternative.

To see a simple practice on a complicated time series analysis with Stan, first download a sample dataset from GitHub and import it as "d" to your RStudio workspace. It contains 3 independent variables (x1, x2 and x3) and 1 dependent variable (y). Here I assume y means a certain the number of conversion on a daily basis and x1-x3 mean daily amounts of budget for distinct ads. You can overview how they are just by plotting.

> par(mfrow=c(4,1),mar=c(1,6,1,1))
> plot(d$y,type='l',lwd=3,col='red')
> plot(d$x1,type='l',lwd=1.5)
> plot(d$x2,type='l',lwd=1.5)
> plot(d$x3,type='l',lwd=1.5)


It appears to include some nonlinear trend. Usual multiple linear regression cannot fit such a time series.

> d.lm<-lm(y~.,d)
> matplot(cbind(d$y,predict(d.lm,d[,-4])),type='l',lty=1,lwd=3,col=c(1,2))


In traditional econometrics, such a trend should be treated as, for example, unit root process or trend process. But here we tackle it with just a Bayesian modeling using Stan.

Read more