Data Scientist TJO in Tokyo

Data science, statistics or machine learning in broken English

Comparing machine learning classifiers based on their hyperplanes or decision boundaries

In Japanese version of this blog, I've written a series of posts about how each kind of machine learning classifiers draws various classification hyperplanes or decision boundaries.


So in this post I want to show you a summary of the series and how their hyperplanes or decision boundaries vary (translated from Japanese version). It must be interesting and help you understand a nature of each classifier. Here I chose some representative classifiers as follows: decision tree (DT), logistic regression (LR: only for linearly separable cases), support vector machine (SVM), neural networks (NN: back-propagation multi-layer perceptron) and random forest (RF). They are all supervised learning methods and easy to import in R*1.


I'm still new to this field and just a "package-user", not serious expert in machine learning and its scientific basis*2. For such people, explaining meanings of algorithms or theorems is not helpful for understanding how they work -- instead, visualized feature (= hyperplanes or decision boundaries) well help us, I believe.

*1:Functions and packages used here were: rpart(){mvpart} for decision tree, glm(){stats} or vglm(){MASS} for logistic regression or multinomial logit, svm(){e1071} for SVM, nnet(){nnet} for neural networks and randomForest(){randomForest} for random forest

*2:Even I'm Ph. D. in a certain experimental research field

Read more