This post is reproduced from a post of my Japanese blog.A friend of mine, an academic researcher in machine learning field tweeted as below.imbalanced data に対する対処を勉強していたのだけど,[Wallace et al. ICDM'11] https://t.co/ltQ942lKP…
This post is a reproduced version of the post in my Japanese blog.For years, a lot of beginners in machine learning have asked me such as "Do I have to learn mathematics? What kind? To what extent?" and sometimes I've found it very hard to…
Almost two years ago, I wrote a post about the situation of "Data Scientist" and "Artificial Intelligence" at that time.After two years have passed, now what's happening and what do we see? Below is a summary of current situation of data s…
Two years ago, I published a book -- written in Japanese so I'm afraid most of the readers can't read it :'( Actually this book was written as a summary of 10 major data science methods. But as two years have gone, the content of the book …
Actually I've known about MXnet for weeks as one of the most popular library / packages in Kaggler, but just recently I heard bug fix has been almost done and some friends say the latest version looks stable, so at last I installed it. MXn…
Taste of Wine vs. Data Science from Takashi J OZAKI At a certain meetup on the other day, I talked about a brand-new relationship between taste of wine (i.e. professional tasting) and data science. This talk was inspired by a book "Wine Sc…
In the previous post, we successfully estimated a model with a nonlinear trend by using Stan. But please remember this is a time series dataset. Does it include any other kind of nonlinear components? Yes, we have to be careful for seasona…
The previous post reviewed how to estimate a simple hierarchical Bayesian models. You can see more complicated cases in a great textbook "The BUGS book". But personally hierarchical Bayesian modeling is the most useful for time-series anal…
In 2 previous posts, you learned what Bayesian modeling and Stan are and how to install them. Now you are ready to try it on some very Bayesian problems - as many people love - such as hierarchical Bayesian model. Definition of hierarchica…
The previous post overviewed what and how is Stan on R. Bayesian modeling with R and Stan (1): Overview - Data Scientist in Ginza, Tokyo Are you ready now? OK, this post reviews how to install Stan. Let's start here! :) In principle this p…
Although I've written a series of posts titled "Machine Learning for package uses in R", usually I don't run machine learning on daily analytic works because my current coverage is so-called an ad-hoc analysis. Instead of machine learning,…
In many cases of digital marketing especially if it's online, marketers or analysts usually love to apply A/B tests in order to find the most influential metric on KGI/KPIs from a huge set of explanatory metrics, such as creative component…
As far as I've known, Xgboost is the most successful machine learning classifier in several competitions in machine learning, e.g. Kaggle or KDD cups. Indeed the team winning Higgs-Boson competition used Xgboost and below is their code rel…
Random Forest is still one of the strongest supervised learning methods although these days many people love to use Deep Learning or Convolutional NN. Of course because it's simple architecture and a lot of implementation in various enviro…
These days almost everybody appears to love a variation of Neural Network (NN) -- Deep Learning. I already argued about how Deep Learning works and what kind of parameters characterizes it in the previous post. What kind of decision bounda…
Actually support vector machine (SVM) is the one that I love the most among various machine learning classifiers... because of its strong generalization and beautiful decision boundary (in high dimensional space). Although there are other …
I think a lot of people love logistic regression because it's pretty light and fast. But we know it's just a linear classifying function -- I mean it's only for linearly separable patterns, not linearly non-separable ones. It's primary ide…
Notice Currently {mvpart} CRAN package was removed from CRAN due to expiration of its support. For installation, 1) please download the latest (but expired) package archive from the old archive site and 2) install it following the procedur…
Below is the most popular post in this blog that recorded an enormous number of PV and received a lot of comments even here or outside this blog. Comparing machine learning classifiers based on their hyperplanes or decision boundaries - Da…
In the previous post we saw how Deep Learning with {h2o} works and how Deep Belief Nets implemented by h2o.deeplearning draw decision boundaries for XOR patterns.What kind of decision boundaries does Deep Learning (Deep Belief Net) draw? P…
More than a year ago, I pointed out that "Data Scientist" has attracted less attention than ever.Puzzling situation of "Data Scientist" in Japanese market - Data Scientist in Ginza, TokyoSo, what's going on in 2015?... yes, I think not a f…
For a while (at least several months since many people began to implement it with Python and/or Theano, PyLearn2 or something like that), nearly I've given up practicing Deep Learning with R and I've felt I was left alone much further away…
On Apr 17, I joined Global TokyoR #1 and talked about a stuff below. Visualization of Supervised Learning with {arules} + {arulesViz} from Takashi J Ozaki (Note: please install {igraph} package before installing {arulesViz}) By the way, th…
I read a set of much interesting questions by Dr. Vincent Graville as below: 10 questions about big data and data science - Data Science Central Should companies embrace big data? Which ones (start-ups, big-companies, tech companies, retai…
(The original posts in Japanese version are here and here ) In Japan, from my own experience, there may be a dichotomy between "analytics" and "data science". It has been said that real business matters require rapid analyses and rapid act…
(The original post in Japanese version is here) In several marketing teams that I've worked on or from not a few people in the other companies for marketing, I've heard some complaints as follows: "We're working hard to improve and optimiz…
I'm nothing but a Data Scientist in a company -- but at the same time I'm working as an evangelist of data science and data scientists themselves. I've been watching how people think of data scientists and how deep they are accepted in Jap…
In Japanese version of this blog, I've written a series of posts about how each kind of machine learning classifiers draws various classification hyperplanes or decision boundaries. So in this post I want to show you a summary of the serie…
Hello everybody in data science community -- I'm TJO from Ginza, Tokyo. Ginza is one of the most bustling downtowns in not only Tokyo, but also all over Japan. After 6 years academic career in experimental neuroscience, I moved to data sci…