Data Scientist TJO in Tokyo

Data science, statistics or machine learning in broken English

Machine learning for package users with R (6): Xgboost (eXtreme Gradient Boosting)

As far as I've known, Xgboost is the most successful machine learning classifier in several competitions in machine learning, e.g. Kaggle or KDD cups. Indeed the team winning Higgs-Boson competition used Xgboost and below is their code release.



Very fortunately, we have a good implementation for R or Python and it's distributed via GitHub. You can get it from the link below.



{xgboost} package is now out of CRAN so you have to install it with {devtools} package. README.md of the GitHub repository above shows how to install and you can easily do it.

Read more

Machine learning for package users with R (5): Random Forest

Random Forest is still one of the strongest supervised learning methods although these days many people love to use Deep Learning or Convolutional NN. Of course because it's simple architecture and a lot of implementation in various environments or languages, e.g. Python, R.


The point I wanna emphasize here is that Random Forest is also a strong method for not only classification but also estimation of variable importance, derived from its origin, decision / regression tree.

Read more

Machine learning for package users with R (4): Neural Network

These days almost everybody appears to love a variation of Neural Network (NN) -- Deep Learning. I already argued about how Deep Learning works and what kind of parameters characterizes it in the previous post.


Read more