Data Scientist TJO in Tokyo

Data science, statistics or machine learning in broken English

{mxnet} R package from MXnet, an intuitive Deep Learning framework including CNN & RNN

Actually I've known about MXnet for weeks as one of the most popular library / packages in Kaggler, but just recently I heard bug fix has been almost done and some friends say the latest version looks stable, so at last I installed it.



MXnet is a framework distributed by DMLC, the team also known as a distributor of Xgboost. Now its documentation looks to be completed and even pre-trained models for ImageNet are distributed. I think this should be a good news for R-users loving machine learning... so let's go.

Read more

Can multivariate modeling predict taste of wine? Beyond human intuition and univariate reductionism


At a certain meetup on the other day, I talked about a brand-new relationship between taste of wine (i.e. professional tasting) and data science. This talk was inspired by a book "Wine Science: The Application of Science in Winemaking". Below is its Japanese edition.


新しいワインの科学

新しいワインの科学


For readers who can't read Japanese, I summarized the content of the talk in this post. Just for your information, I myself am a super wine lover :) and I'm also much interested in how data science explain taste of wine.


In order to run analytics below, I prepared an R workspace in my GitHub repository. Please download and import it into your R environment.



(Note: All quotes from Goode's book here are reversely translated from the Japanese edition and it may contain not a few difference from the original version)

Read more

Bayesian modeling with R and Stan (5): Time series with seasonality

In the previous post, we successfully estimated a model with a nonlinear trend by using Stan.


But please remember this is a time series dataset. Does it include any other kind of nonlinear components? Yes, we have to be careful for seasonality. Actually when I generate the dataset, I added a seasonal component with a 7 days cycle.


Then we have to change the model as follows.


CV_t = Q_t + \sum{trend_t} + season_{mod(t,7)}

trend_t - trend_{t-1} = trend_{t-1} - trend_{t-2} + \epsilon_t

\displaystyle \sum^{7}_k season_k \sim \cal{N} (0, \sigma_{season})

 Q_t = a x_{1t} + b x_{2t} + c x_{3t} + d + \epsilon_t

Read more