FastTreeSHAP is a Python package that enables the efficient interpretation of tree-based machine learning models by computing sample-level feature importance values.2
The project was recently open-sourced by LinkedIn and was used at the company to improve member experience in products such as People You May Know (PYMK), newsfeed ranking, search, and job recommendations, as well as customer-facing products within sales and marketing.
The FastTreeSHAP open-source package implements the algorithms FastTreeSHAP v1 and FastTreeSHAP v2 that make the packages 1.5x and 2.5x times faster than TreeSHAP, respectively.
Parallel multi-core computing is fully enabled in the FastTreeSHAP package and it contains the same API as the TreeSHAP implementation in the SHAP package, with the exception of three additional arguments which are easy to tune in practice.
“SHAP calculates the average impact of adding a feature to the model by accounting for all possible subsets of the other features. In contrast to other approaches, SHAP has been justified as the only consistent feature attribution approach with several unique properties (local accuracy, missingness, and consistency), which agree with human intuition,” Jilei Yang, Humberto Gonzalez, Parvez Ahammad from LinkedIn wrote in a blog post. “Due to its solid theoretical guarantees, SHAP has become a top model interpretation approach in industry.”
After looking into many TreeSHAP use cases, LinkedIn found that despite its algorithmic complexity improvement, computing SHAP values for a large sample size or a large model size still remained a computational concern in practice. This resulted in the key improvements in the FastTreeSHAP versions.