Data Scientist TJO in Tokyo

Data science, statistics or machine learning in broken English

statistics

10+2 Data Science Methods that Every Data Scientist Should Know in 2016

Two years ago, I published a book -- written in Japanese so I'm afraid most of the readers can't read it :'( Actually this book was written as a summary of 10 major data science methods. But as two years have gone, the content of the book …

Can multivariate modeling predict taste of wine? Beyond human intuition and univariate reductionism

Taste of Wine vs. Data Science from Takashi J OZAKI At a certain meetup on the other day, I talked about a brand-new relationship between taste of wine (i.e. professional tasting) and data science. This talk was inspired by a book "Wine Sc…

Bayesian modeling with R and Stan (5): Time series with seasonality

In the previous post, we successfully estimated a model with a nonlinear trend by using Stan. But please remember this is a time series dataset. Does it include any other kind of nonlinear components? Yes, we have to be careful for seasona…

Bayesian modeling with R and Stan (4): Time series with a nonlinear trend

The previous post reviewed how to estimate a simple hierarchical Bayesian models. You can see more complicated cases in a great textbook "The BUGS book". But personally hierarchical Bayesian modeling is the most useful for time-series anal…

Bayesian modeling with R and Stan (3): Simple hierarchical Bayesian model

In 2 previous posts, you learned what Bayesian modeling and Stan are and how to install them. Now you are ready to try it on some very Bayesian problems - as many people love - such as hierarchical Bayesian model. Definition of hierarchica…

Bayesian modeling with R and Stan (2): Installation and an easy example

The previous post overviewed what and how is Stan on R. Bayesian modeling with R and Stan (1): Overview - Data Scientist in Ginza, Tokyo Are you ready now? OK, this post reviews how to install Stan. Let's start here! :) In principle this p…

Bayesian modeling with R and Stan (1): Overview

Although I've written a series of posts titled "Machine Learning for package uses in R", usually I don't run machine learning on daily analytic works because my current coverage is so-called an ad-hoc analysis. Instead of machine learning,…

Univariate stats sometimes fail, while multivariate modelings work well

In many cases of digital marketing especially if it's online, marketers or analysts usually love to apply A/B tests in order to find the most influential metric on KGI/KPIs from a huge set of explanatory metrics, such as creative component…

Machine learning for package users with R (2): Logistic Regression

I think a lot of people love logistic regression because it's pretty light and fast. But we know it's just a linear classifying function -- I mean it's only for linearly separable patterns, not linearly non-separable ones. It's primary ide…

Simple analytics work fast, but cannot avoid third-party effects

(The original posts in Japanese version are here and here ) In Japan, from my own experience, there may be a dichotomy between "analytics" and "data science". It has been said that real business matters require rapid analyses and rapid act…

Pitfall of "regression to the mean" in growth hacking

(The original post in Japanese version is here) In several marketing teams that I've worked on or from not a few people in the other companies for marketing, I've heard some complaints as follows: "We're working hard to improve and optimiz…