Artificial intelligence researchers want to advance natural language processing with the release of SuperGLUE. SuperGLUE builds off of the previous General Language Understanding Evaluation (GLUE) benchmark, but aims to provide more difficult language understanding tasks and a new public leaderboard. 

SuperGLUE was developed by AI researchers from Facebook AI, Google DeepMind, New York University and University of Washington. 

AI ethics: Early but formative days
New machine learning inference benchmarks assess performance of AI-powered apps

“In the last year, new models and methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks. The GLUE benchmark, introduced one year ago, offered a single-number metric that summarizes progress on a diverse set of such tasks, but performance on the benchmark has recently come close to the level of non-expert humans, suggesting limited headroom for further research,” the researchers wrote on the SuperGLUE website

According to Facebook AI’s research, after its method for pretraining self-supervised NLP systems RoBERTa surpassed human baselines in simple multitask and transfer learning techniques, there was a need to continue to advance the state of the area. “Across the field, NLU systems have advanced at such a rapid pace that they’ve hit a ceiling on many existing benchmarks,” the researchers wrote in a post

SuperGLUE is comprised of new ways to test creative approaches on a range of difficult NLP tasks including sample-efficient, transfer, multitask and self-supervised learning. To challenge researchers, the team selected tasks that have varied formats with more “nuanced” questions that are easily solvable by people.

“By releasing new standards for measuring progress, introducing new methods for semi-supervised and self-supervised learning, and training over ever-larger scales of data, we hope to inspire the next generation of innovation. By challenging one another to go further, the NLP research community will continue to build stronger language processing systems,” the researchers wrote. 

The new benchmark also includes a new challenge, which requires machines to provide complex answers to open ended questions such as “How do jellyfish function without a brain?” The researchers explain this will require AI to synthesize information from various sources.

Another benchmark has to do with Choice of Plausible Alternatives (COPA), a causal reasoning task in which a system is given a premise sentence and must determine either the cause or effect of the premise from two possible choices. 

“These new tools will help us create stronger content understanding systems that can translate hundreds of languages and understand intricacies such as ambiguities, co-references and commonsense reasoning — with less reliance on the large amounts of labeled training data that’s required of most systems today,” Facebook wrote.