The natural language processing Python library spaCy was recently updated to version 2.0. SpaCy is an open-source project that was created based on recent language processing research. The author’s intent is for it to ultimately be used in real products and solutions.
Version 2.0 adds several new features, including new neural network models, support for more languages, and improved documentation. “The new version gets spaCy up to date with the latest deep learning technologies and makes it much easier to run spaCy in scalable cloud computing workflows,” wrote Matthew Honnibal, author of spaCy in the release notes.
Thirteen new neural network models for more than seven languages are included in this release. It also adds alpha tokenization support for eight new languages: English, German, Spanish, Portuguese, French, Italian, Dutch, and multi-language NER. It uses a bloom embedding strategy in order to support large vocabularies in small tables. The core neural network models have part-of-speech tags, dependency labels, and named entities, while small models will only have context-specific token vectors and medium large models have word vectors.
For this release, most of the usage guides, API docs, and code examples were rewritten. The documentation now includes information on custom processing pipelines, visualizers, training tutorials, word vectors, and rule-based matching. There is now a spaCy 101 guide which contains explanations and illustrations of important concepts as well as a summary of the library’s features.
Since the update to 2.0 a little over a week ago, version 2.0.3 has been released to address some bugs and even further update the documentation by adding a videos section and updating the training tips and advice section.
Top 5 trending projects on GitHub this week:
#1. Git flight rules: A guide for programmers using Git. It goes over what developers can do if things go wrong. Still trending from last week!
#2. State of the art result for machine learning problems: As the name states, SoTA results for all machine learning problems.
#3. Node best practices: A list of Node.js practices
#4. JS code to SVG flowchart: A visualization library for converting JS code into a SVG flowchart.
#5. Tensorflow: An open-source software library for machine intelligence.