For humans, writing posts on social media just comes naturally. Humans understand each word that’s said or typed, but for machines, it’s not that easy. Understanding the meaning of words is one of the biggest challenges that artificial intelligence researchers face today, and this week’s GitHub project named fastText aims to solve that challenge.
Automatic text processing is a part of everyday interactions with computers, and it generates plenty of online data. With all of this data, specific tools are needed to understand the content of large datasets. To address this challenge, the Facebook AI Research (FAIR) lab is open-sourcing its fastText library to help build scalable solutions for text representation and classification.
According to FAIR, “FastText combines some of the most successful concepts introduced by the natural language processing and machine learning communities in the last few decades. These include representing sentences with bags of words and bags of n-grams, as well as using subword information, and sharing information across classes through a hidden representation.”
FAIR also employs a hierarchical softmax that takes advantage of unbalanced distribution of the class, which speeds up computation. Each of these concepts are used for different tasks: efficient text classification and learning word-vector representations.
FastText helps solve the problem of deep neural networks, which can be slow to train and test, according to the research team. FastText uses a hierarchical classifier instead of a flat structure, and categories are instead organized in a tree, which reduces time for training and testing text classifiers.
With fastText, the research team was able to cut training times from days to seconds and achieve “state-of-the-art performance on many standard problems, such as sentiment analysis or tag prediction,” said FAIR.
The tool has been designed to work with languages like English, German, French, Spanish and Czech. FastText takes advantage of the languages’ morphological structure and uses a simple way of including subword information.
The team hopes that other developers will help build better, more scalable solutions for text representation and classification by using this open-source tool. The Facebook AI Research lab is sharing its research relating to fastText, and it’s available here through the Cornell University Library. Also, fastText is available on GitHub today.
Top 5 projects trending on GitHub this week
#1. webpack-dashboard: A CLI dashboard for webpack dev server.
#2. Weight-loss: Using machine learning to lose weight. Your actual weight, that is.
#3. Recharts: A redefined chart library built with React and D3.
#4. Top Deep Learning: A list of GitHub projects related to deep learning.
#5. React-server: A React framework that loads pages really fast.