Artificial intelligence researchers want to advance natural language processing with the release of SuperGLUE. SuperGLUE builds off of the previous General Language Understanding Evaluation (GLUE) benchmark, but aims to provide more difficult language understanding tasks and a new public leaderboard.Β 

SuperGLUE was developed by AI researchers from Facebook AI, Google DeepMind, New York University and University of Washington.Β 

RELATED CONTENT:
AI ethics: Early but formative days
New machine learning inference benchmarks assess performance of AI-powered apps

β€œIn the last year, new models and methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks. The GLUE benchmark, introduced one year ago, offered a single-number metric that summarizes progress on a diverse set of such tasks, but performance on the benchmark has recently come close to the level of non-expert humans, suggesting limited headroom for further research,” the researchers wrote on the SuperGLUE website.Β 

According to Facebook AI’s research, after its method for pretraining self-supervised NLP systems RoBERTa surpassed human baselines in simple multitask and transfer learning techniques, there was a need to continue to advance the state of the area. β€œAcross the field, NLU systems have advanced at such a rapid pace that they’ve hit a ceiling on many existing benchmarks,” the researchers wrote in a post.Β 

SuperGLUE is comprised of new ways to test creative approaches on a range of difficult NLP tasks including sample-efficient, transfer, multitask and self-supervised learning. To challenge researchers, the team selected tasks that have varied formats with more β€œnuanced” questions that are easily solvable by people.

β€œBy releasing new standards for measuring progress, introducing new methods for semi-supervised and self-supervised learning, and training over ever-larger scales of data, we hope to inspire the next generation of innovation. By challenging one another to go further, the NLP research community will continue to build stronger language processing systems,” the researchers wrote.Β 

The new benchmark also includes a new challenge, which requires machines to provide complex answers to open ended questions such as β€œHow do jellyfish function without a brain?” The researchers explain this will require AI to synthesize information from various sources.

Another benchmark has to do with Choice of Plausible Alternatives (COPA), a causal reasoning task in which a system is given a premise sentence and must determine either the cause or effect of the premise from two possible choices.Β 

β€œThese new tools will help us create stronger content understanding systems that can translate hundreds of languages and understand intricacies such as ambiguities, co-references and commonsense reasoning β€” with less reliance on the large amounts of labeled training data that’s required of most systems today,” Facebook wrote.