Google is open-sourcing a “lite” version of their BERT natural language processing (NLP) pre-training technique. ALBERT is an updated version of BERT that improves 12 NLP tasks, including the Stanford Question Answering Dataset (SQuAD v2.0) and the SAT reading comprehension RACE benchmark.
BERT was first open sourced by Google at the end of 2018, and since then, natural language research has reached a new paradigm of leveraging large volumes of existing text to pretrain model parameters, the company explained.
ALBERT was first introduced in a paper called “ALBERT: A Lite BERT for Self-supervised Learning of Language Representations,” which was accepted into ICLR 2020. Google is releasing the project as an open-source implementation on TensorFlow.
The open-source version of ALBERT contains several ready-to-use pre-trained language representation models.
“The success of ALBERT demonstrates the importance of identifying the aspects of a model that give rise to powerful contextual representations. By focusing improvement efforts on these aspects of the model architecture, it is possible to greatly improve both the model efficiency and performance on a wide range of NLP tasks. To facilitate further advances in the field of NLP, we are open-sourcing ALBERT to the research community,” Radu Soricut and Zhenzhong Lan, research scientists at Google Research and authors of the paper on ALBERT, wrote in a blog post.