XGBoost: A Scalable Tree Boosting System
Diego Marinho de Oliveira
Gen-AI Search, RecSys | ex-SEEK, AI Lead, Data Scientist Manager and ML Engineer Specialist
"Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems."
Authors: Tianqi Chen, Carlos Guestrin
Read full article at https://arxiv.org/abs/1603.02754
Data Engineering Practice Manager
9 年Informative
Chief Technology Officer
9 年Thanks for sharing!
Data Professionals
9 年https://github.com/dmlc/xgboost ??