Data Scientist TJO in Tokyo

Data science, statistics or machine learning in broken English

Bayesian modeling with R and Stan (5): Time series with seasonality

In the previous post, we successfully estimated a model with a nonlinear trend by using Stan.


But please remember this is a time series dataset. Does it include any other kind of nonlinear components? Yes, we have to be careful for seasonality. Actually when I generate the dataset, I added a seasonal component with a 7 days cycle.


Then we have to change the model as follows.


CV_t = Q_t + \sum{trend_t} + season_{mod(t,7)}

trend_t - trend_{t-1} = trend_{t-1} - trend_{t-2} + \epsilon_t

\displaystyle \sum^{7}_k season_k \sim \cal{N} (0, \sigma_{season})

 Q_t = a x_{1t} + b x_{2t} + c x_{3t} + d + \epsilon_t

Read more

Bayesian modeling with R and Stan (4): Time series with a nonlinear trend

The previous post reviewed how to estimate a simple hierarchical Bayesian models. You can see more complicated cases in a great textbook "The BUGS book".


But personally hierarchical Bayesian modeling is the most useful for time-series analysis. I think {dlm} CRAN package is popular for such a purpose, but in order to run more complicated modeling Stan would be a powerful alternative.


To see a simple practice on a complicated time series analysis with Stan, first download a sample dataset from GitHub and import it as "d" to your RStudio workspace. It contains 3 independent variables (x1, x2 and x3) and 1 dependent variable (y). Here I assume y means a certain the number of conversion on a daily basis and x1-x3 mean daily amounts of budget for distinct ads. You can overview how they are just by plotting.

> par(mfrow=c(4,1),mar=c(1,6,1,1))
> plot(d$y,type='l',lwd=3,col='red')
> plot(d$x1,type='l',lwd=1.5)
> plot(d$x2,type='l',lwd=1.5)
> plot(d$x3,type='l',lwd=1.5)

f:id:TJO:20150817183557p:plain


It appears to include some nonlinear trend. Usual multiple linear regression cannot fit such a time series.

> d.lm<-lm(y~.,d)
> matplot(cbind(d$y,predict(d.lm,d[,-4])),type='l',lty=1,lwd=3,col=c(1,2))

f:id:TJO:20150818122448p:plain


In traditional econometrics, such a trend should be treated as, for example, unit root process or trend process. But here we tackle it with just a Bayesian modeling using Stan.

Read more

Bayesian modeling with R and Stan (3): Simple hierarchical Bayesian model

In 2 previous posts, you learned what Bayesian modeling and Stan are and how to install them. Now you are ready to try it on some very Bayesian problems - as many people love - such as hierarchical Bayesian model.


Definition of hierarchical Bayesian models


Prior to tackling with a practical example, let's overview what and how hierarchical Bayesian model is. A famous book on Bayesian modeling with MCMC, written by Toshiro Tango and Taeko Becque and published in Japan, describes as below*1.

ベイジアン統計解析の実際 (医学統計学シリーズ)

ベイジアン統計解析の実際 (医学統計学シリーズ)

In a fixed-effects model of frequentist, each result is assumed to have a common average \theta.

y_i \sim N(\theta, s^2_i)

On the other hand, in a random-effects model, each result is assumed to have a distinct average \theta_i and it is distributed around a global average \mu.

y_i \sim N(\theta_i, s^2_i)

\theta_i \sim N(\mu, \sigma^2_i)

Bayesian hierarchical models assume prior probability for parameters \mu, \sigma^2_\theta of a probability distribution of \theta_i in a random-effects model, such as

\mu \sim N(0,100)

1/{\sigma^2_i} \sim Gamma(0.001,0,001)

It is said that such models have a hierarchical structure with two levels, that is,

  • 1st level: a probability distribution is assumed for \theta_i
  • 2nd level: one more probability distribution is assumed for parameters of the 1st level \mu, \sigma^2_i


This is a textbook definition of hierarchical models, but I think it can be understood more intuitively; in hierarchical Bayesian models, often the models have to handle some excessive fluctuations as nonlinear effects more than expected in usual frequentist's models. Priors used in such models can be seen as an "absorber" that can absorb various kinds of fluctuations distributed around true parameters.

*1:Its original text is of course in Japanese, so this is just my own interpretation

Read more