After more than three years of development, the Apache PDFBox team has announced the release of Apache PDFBox 2.0.0.

The Apache PDFBox library is an open-source Java tool for working with PDF documents. The project allows creation and manipulation of PDF documents, and the ability to extract content from them. Apache PDFBox also includes several command-line utilities, and it is published under the Apache License v2.0.

In February 2015, the project became the first Open Source Partner Organization in the PDF Association.

(Related: Apache Flink hits 1.0)

“PDF is a very popular and easy-to-use format for document exchange,” said Andreas Lehmkühler, vice president of Apache PDFBox. “It is used by millions of people every day; however the format itself is quite complicated and a real challenge to write a piece of software to work with it. This new major release of PDFBox includes a lot of improvements, fixes, and new features which should make the life easier for our users.”

With this latest release, the Apache PDFBox library enables users to create new PDF documents, manipulate existing documents, and extract content. It also enables users to digitally sign, print and validate files against the PDF/A-1b standard. Its command-line utilities include encrypt, decrypt, overlay, debugger, merger, PDFToImage and TextToPDF.

On the maintenance side, PDFBox 2.0 has 1,167 solved issues, 418 of which were back-ported to v1.8, as well as dozens of improvements and enhancements, according to a release. Highlights include:

  • improved rendering and text extraction
  • Unicode support for PDF creation
  • overhauled interactive forms support
  • extended signing and encryption support
  • overhauled parser, including a self-healing mechanism for malformed or corrupted PDFs
  • reduced memory/resources footprint, including fine-grained control of memory usage
  • enhanced preflight module for PDF/A-1b conformance checking
  • rearranged package structure to allow smaller runtime environments