Bayesian modeling with R and Stan (5): Time series with seasonality

In the previous post, we successfully estimated a model with a nonlinear trend by using Stan.

But please remember this is a time series dataset. Does it include any other kind of nonlinear components? Yes, we have to be careful for seasonality. Actually when I generate the dataset, I added a seasonal component with a 7 days cycle.

Then we have to change the model as follows.

$CV_t = Q_t + \sum{trend_t} + season_{mod(t,7)}$

$trend_t - trend_{t-1} = trend_{t-1} - trend_{t-2} + \epsilon_t$

$\displaystyle \sum^{7}_k season_k \sim \cal{N} (0, \sigma_{season})$

$Q_t = a x_{1t} + b x_{2t} + c x_{3t} + d + \epsilon_t$

The model above can be described with a Stan code below. Save it as "hb_ts2.stan".

data {
	int<lower=0> N;
	real<lower=0> x1[N];
	real<lower=0> x2[N];
	real<lower=0> x3[N];
	real<lower=0> y[N];
}

parameters {
	real trend[N];
	real season[N];
	real s_trend;
	real s_q;
	real s_season;
	real<lower=0> a;
	real<lower=0> b;
	real<lower=0> c;
	real d;
}

model {
	real q[N];
	real cum_trend[N];
	for (i in 7:N) {
		season[i]~normal(-season[i-1]-season[i-2]-season[i-3]-season[i-4]-season[i-5]-season[i-6],s_season);
	}
	for (i in 3:N)
		trend[i]~normal(2*trend[i-1]-trend[i-2],s_trend);
	cum_trend[1]<-trend[1];
	for (i in 2:N)
		cum_trend[i]<-cum_trend[i-1]+trend[i];

	for (i in 1:N)
		q[i]<-y[i]-cum_trend[i]-season[i];
	for (i in 1:N)
		q[i]~normal(a*x1[i]+b*x2[i]+c*x3[i]+d,s_q);
}

Are you ready? Now just run as below. It may take much long time because computing seasonality is heavy, so please be patient :)

> fit2<-stan(file='hb_ts2.stan',data=dat,iter=1000,chains=4)
> fit2.smp<-extract(fit2)
> dens2_a<-density(fit2.smp$a)
> dens2_b<-density(fit2.smp$b)
> dens2_c<-density(fit2.smp$c)
> dens2_d<-density(fit2.smp$d)
> a_est2<-dens2_a$x[dens2_a$y==max(dens2_a$y)]
> b_est2<-dens2_b$x[dens2_b$y==max(dens2_b$y)]
> c_est2<-dens2_c$x[dens2_c$y==max(dens2_c$y)]
> d_est2<-dens2_d$x[dens2_d$y==max(dens2_d$y)]
> trend_est2<-rep(0,100)
> for (i in 1:100) {
+     tmp<-density(fit2.smp$trend[,i])
+     trend_est2[i]<-tmp$x[tmp$y==max(tmp$y)]
+ }
> week_est2<-rep(0,100)
> for (i in 1:100) {
+     tmp<-density(fit2.smp$season[,i])
+     week_est2[i]<-tmp$x[tmp$y==max(tmp$y)]
+ }
> pred2<-a_est2*d$x1+b_est2*d$x2+c_est2*d$x3+d_est2+cumsum(trend_est2)+week_est2
> matplot(cbind(d$y,pred2),type='l',lty=1,lwd=c(2,3),col=c(1,2))
> legend('topleft',c('Data','Predicted'),col=c(1,2),lty=1,lwd=c(2,3),cex=1.5,ncol=2)
> cor(d$y,pred)
[1] 0.9452432
> cor(d$y,pred2)
[1] 0.9949925

f:id:TJO:20150818125555p:plain

Perfect! Just for your information, plot its seasonal component.

> plot(week_est2,type='l')

f:id:TJO:20150818125834p:plain

It's a precise 7-days cycle. Parameters are also well estimated, better than without seasonality.

> a_est2
[1] 0.0001505969
> b_est2
[1] 0.0002937902
> c_est2
[1] 5.39804e-05
> d_est2
[1] 1516.145

The only failure is an intercept because it can be included in the other components such as trend or seasonality. I have no idea for solving it but it doesn't matter I think.

Finally, just plot as below.

> matplot(cbind(d$y,pred2,cumsum(trend_est2)+d_est2,week_est2+cumsum(trend_est2)+d_est2),type='l',lty=1,lwd=c(2,3,2,2),col=c('black','red','blue','green'))
> legend('topleft',c('Data','Predicted','Trend','Seasonality + Trend'),col=c('black','red','blue','green'),lty=1,lwd=c(2,3,2,2),cex=1.2,ncol=2)

f:id:TJO:20150818144540p:plain

A trend and seasonality are decomposed and visualized.

Conclusions

The lessons we learn in these 2 posts are:

A time series with a nonlinear trend cannot be precisely modeled by usual linear regression
As well as dynamic linear modeling or state-space modeling, Bayesian modeling with MCMC can model it
In case of usual econometric time series analysis, you have to be careful for seasonality

In some extreme cases a time series may contain further complicated nonlinear components such as weekend effect, ultra-long term trend, etc. But Bayesian modeling can handle them with its great flexibility.

Data Scientist TJO in Tokyo

Data science, statistics or machine learning in broken English

Bayesian modeling with R and Stan (5): Time series with seasonality

Conclusions