IBM recently announced the open-source library Text Extensions for Pandas, which features extensions that turn Pandas DataFrames into a universal data structure that can be used in natural language processing (NLP). 

According to the company, the goal of this project is to make NLP simple. In creating the library, it wanted to avoid creating algorithms that navigate data structures based on the outputs of NLP models, and instead use Pandas DataFrames to represent NLP data.

The library includes Pandas extension types for representing natural language data and library integrations that convert the outputs of NLP libraries into DataFrames. 

Text Extensions for Pandas provides three key benefits: transparency, simplicity, and compatibility, according to IBM. 

“This project aligns with IBM’s goal to continually develop and deliver new natural language processing innovations, both in the open source community and through products like Watson Discovery and Watson Natural Language Understanding,” Frederick Reiss, chief architect at IBM Center for Open-Source Data and AI technologies, and Willie M. Tejada, chief developer advocate at IBM, wrote in a post

In addition, Text Extensions for Pandas integrates with IBM Watson Natural Language Understanding and IBM Watson Discover.