2014-01-06

In Japanese version of this blog, I've written a series of posts about how each kind of machine learning classifiers draws various classification hyperplanes or decision boundaries.

So in this post I want to show you a summary of the series and how their hyperplanes or decision boundaries vary (translated from Japanese version). It must be interesting and help you understand a nature of each classifier. Here I chose some representative classifiers as follows: decision tree (DT), logistic regression (LR: only for linearly separable cases), support vector machine (SVM), neural networks (NN: back-propagation multi-layer perceptron) and random forest (RF). They are all supervised learning methods and easy to import in R*1.

I'm still new to this field and just a "package-user", not serious expert in machine learning and its scientific basis*2. For such people, explaining meanings of algorithms or theorems is not helpful for understanding how they work -- instead, visualized feature (= hyperplanes or decision boundaries) well help us, I believe.

*1:Functions and packages used here were: rpart(){mvpart} for decision tree, glm(){stats} or vglm(){MASS} for logistic regression or multinomial logit, svm(){e1071} for SVM, nnet(){nnet} for neural networks and randomForest(){randomForest} for random forest

*2:Even I'm Ph. D. in a certain experimental research field

Data Scientist TJO in Tokyo

Data science, statistics or machine learning in broken English

Comparing machine learning classifiers based on their hyperplanes or decision boundaries