The artificial intelligence community is getting a new machine learning library to boost their research efforts. Yandex announced the open source of CatBoost this week. Despite its name, CatBoost has nothing to do with cats. Instead, it has to do with gradient boosting.
“Gradient boosting is a machine learning algorithm that is widely applied to the kinds of problems businesses encounter every day like detecting fraud, predicting customer engagement and ranking recommended items like top web pages or most relevant ads. It delivers highly accurate results even in situations where there is relatively little data, unlike deep learning frameworks that need to learn from a massive amount of data,” Misha Bilenko, head of machine intelligence and research for Yandex, wrote in a post.
Features include: The ability to reduce overfitting, categorical features support, a user-friendly API for Python or R, and tools for formula analysis and training visualization. According Bilenko, one of the key things about CatBoost is it is able to provide results without extensive data training unlike traditional machine learning models.
The team hopes the library will be used for a wide variety of industrial machine learning tasks. The most common use cases will range from finance to scientific research. In addition, it can be integrated with deep learning tools such as Google’s TensorFlow.
“By making CatBoost available as an open-source library, we hope to enable data scientists and engineers to obtain top-accuracy models with no effort, and ultimately define a new standard of excellence in machine learning,” Bilenko wrote.
Top 5 projects trending on GitHub this week:
#1. Bash Snippets: A collection of small bash scripts.
#2. Awesome Guidelines: Check out this list for high quality coding style conventions and standards.
#3. Pell: A tiny WYSIWYG text editor for the web. Up from last week’s number 5 slot!
#4. Deep Learning Project: An in-depth, end to end tutorial of a machine learning pipeline from scratch
#5. Practical Node: A first edition of a book about building real-world scalable web apps.