Английская Википедия:CatBoost

Материал из Онлайн справочника
Перейти к навигацииПерейти к поиску

Шаблон:Short description Шаблон:Infobox software

CatBoost[1] is an open-source software library developed by Yandex. It provides a gradient boosting framework which among other features attempts to solve for Categorical features using a permutation driven alternative compared to the classical algorithm.[2] It works on Linux, Windows, macOS, and is available in Python,[3] R,[4] and models built using catboost can be used for predictions in C++, Java,[5] C#, Rust, Core ML, ONNX, and PMML. The source code is licensed under Apache License and available on GitHub.[1]

InfoWorld magazine awarded the library "The best machine learning tools" in 2017.[6] along with TensorFlow, Pytorch, XGBoost and 8 other libraries.

Kaggle listed CatBoost as one of the most frequently used Machine Learning (ML) frameworks in the world. It was listed as the top-8 most frequently used ML framework in the 2020 survey[7] and as the top-7 most frequently used ML framework in the 2021 survey.[8]

As of April 2022, CatBoost is installed about 100000 times per day from PyPI repository[9]

Features

CatBoost has gained popularity compared to other gradient boosting algorithms primarily due to the following features[10]

  • Native handling for categorical features[11]
  • Fast GPU training[12]
  • Visualizations and tools for model and feature analysis
  • Using Oblivious Trees or Symmetric Trees for faster execution
  • Ordered Boosting to overcome overfitting[2]

History

In 2009 Andrey Gulin, developed MatrixNet, a proprietary gradient boosting library that was used in Yandex to rank search results. Since 2009 MatrixNet has been used in different projects in Yandex, including recommendation systems and weather prediction.

In 2014-2015 Andrey Gulin with a team of researchers has started a new project called Tensornet that was aimed at solving the problem of "how to work with categorical data". It resulted in several proprietary Gradient Boosting libraries with different approaches to handling categorical data.

In 2016 Machine Learning Infrastructure team led by Anna Dorogush started working on Gradient Boosting in Yandex, including Matrixnet and Tensornet. They implemented and open-sourced the next version of Gradient Boosting library called CatBoost, which has support of categorical and text data, GPU training, model analysis, visualisation tools.

CatBoost was open-sourced in July 2017 and is under active development in Yandex and the open-source community.

Application

See also

Шаблон:Portal

References

Шаблон:Reflist

External links