Apache OpenNLP, a machine learning toolkit for processing natural language text, has reached version 1.9.0. The toolkit provides support for common NLP tasks including tokenization, sentence segmentation, part-of speech tagging, named entity extraction, chunking, and parsing.
Some of the changes in this version include Brat Document Parser supports name type filters, Brat format support fails on multi fragment annotations, and MD5 hashes have been removed from the Release process. A complete list of changes can be seen here.
The team behind Bloomsbury AI joins Facebook
Facebook has announced that the team behind Bloomsbury AI is joining Facebook’s London office. According to the company, Bloomsbury has expertise in machine reading and understanding unstructured documents in natural language in order to answer questions.
Facebook believes that Bloomsbury’s expertise will strengthen Facebook’s ability to advance natural language processing, allowing them to further understand natural language and its applications.
Google details new text features in Android P Beta
Google has revealed Android P will feature new text features. It will add a new API in Android P called PrecomputedText that solves issues related to displaying text. It enables apps to perform the most time-consuming parts of text layout beforehand, caching the layout result and returning valuable measurement data.
Android P will also introduce a text Magnified, which will help users precisely position the text selection handles by viewing magnified text through a pane that can be dragged over text. In addition, the company has added Smart Linkify, which uses machine learning algorithms to recognize entities in text, improving the reliability of the entities recognized.
IBM releases a large annotation dataset for facial analysis
IBM is releasing a large annotation dataset to be used for studying bias in facial analysis. The dataset contains more than one million images that are annotated with attributes, utilizing geo-tags from Flickr images to balance data from multiple countries. According to IBM, the largest facial attribute currently available has 200,000 images.
It will also be releasing an annotation dataset for 36,000 images that is equally distributed across skin tones, genders, and ages to provide a more diverse dataset.