We are living in a digital age, and businesses no longer want to store their information and data on hard copies. Businesses from across a broad range of industries are turning to the Portable Document Format (PDF) to bring their printed documents online.
“PDFs are really becoming the de facto electronic document standard,” said Gerald Holmann, founder and president of Qoppa Software. “They are being used to replace printed documents and move businesses to a fully electronic office, but also to enable the ability to interchange documents between offices, companies and users in a format that everyone can rely on.”
Adobe designed PDF as a proprietary format in the early 1990s to support any type of content that could be printed, but that has resulted in a wide range of industries that want to use it. Because of this, the format has to be extensive enough to be able to support just about any use case, according to Holmann. This poses a problem for developers who have to consider all the individual elements such as the images, colors, objects, fonts, encryption and digital signatures a PDF could be composed of when trying to implement a PDF document solution into their applications.
“In order to work with PDFs, you need such a broad understanding of the PDF specification,” said Matt Kuznicki, CTO of Datalogics. “Creating is easy, but understanding and taking in PDFs is very hard.”
Looking forward to PDF 2.0
Adobe officially released the PDF format to the broader community as an open standard in 2008, and it was published by the International Organization of Standardization as a standard. It consists of more than 1,000 pages that specify everything that can go into a PDF document.
“There is a whole lot of information in there. It is very dense, rather technical, and in some ways represents somewhat of an obstacle because it is very hard to find a toolkit that really understands and conforms to the entirety of the standard,” said Datalogics’ Kuznicki.
The standards body is currently working on the PDF 2.0 standard, aiming to acknowledge that not all processors, toolkits and viewers are necessarily going to understand or handle all parts of PDFs. The upcoming standard is also meant to clear up some of the misinformation that dogged it in the past.
“It is not a dramatic change; a lot of our effort has gone into clarifying the standard and helping implementers understand how to implement the various features in it, and tighten up where it wasn’t really clear,” said Kuznicki. “It may not sound like a lot, but when you are dealing with such a comprehensive standard, it turns out to be a very daunting undertaking.”
Since PDFs have been around and evolved over almost two decades, while trying to maintain backward compatibility, there are also some features in the specification that have been in there from the beginning, but don’t necessarily need to be included anymore, according to Richard Little, president of LEAD Technologies.
“Currently certain PDF specifications such as [XFL Forms Architecture] are still proprietary to Adobe, but referenced by the ISO standard,” he said. “With PDF 2.0, the ISO committee hopes to cut the bloat and standardize the proprietary extensions. The result should be more complete developer products as the work of maintaining deprecated features can be reallocated to the new features in the ISO standard.”
In addition, the upcoming standard will have some new features that were added to the 1.7 standard. Notable features include an increased level of encryption to produce more secure documents, and a new reduction framework that will define reduction annotations and apply them into a document to remove reductive content, according to Qoppa’s Holmann.
The hope is that the PDF 2.0 standard will get rid of some of the PDF fragmentation in the future. “The standard isn’t going to make the PDFs that currently exist into anything different than what they are, but I think going forward it will serve as more of a useful guidance in terms of building and showing the consensus around how to implement PDF,” said Datalogics’ Kuznicki. “So, in that respect I think that it will bring some greater clarity in kind of tightening up some of the fragmentation in things going forward.”
The standards body is heading into the final stages of the upcoming standard, and it should be available late next year or in early 2017, according to Kuznicki.
“With a clearer standard, it helps people who are creating PDFs to make sure they are adhering to the standard, and then also on the PDF consumer side where you are reading PDFs, modifying them and processing them,” said Holmann. “It’s about having a clear, informative specification that helps everybody.”
Building solutions across all devices
When building a PDF solution, it is not only the specification developers have to deal with, but also the devices the PDF solution is going to be used on. With today’s workers having multiple devices they can do their work from, PDF solutions come in all shapes: mobile, desktop and browser. The problem with this is each device poses different limitations and use cases. With the introduction of HTML5 with SVG, it is possible for developers to convert most PDFs to HTML5/SVG on the server and deliver them to all devices, but it comes with limited functionality and other caveats.
If developers want to ensure their solutions have the right functionality and user experience for each device, it is not possible to build one solution that can span across all devices, according to Holmann. “The technologies in the different devices are too far apart to be able to use the same product on all platforms,” he said.
Developers have to consider different languages, footprints and capabilities for each device. “Things like different screen sizes and different user expectations based on the device and the way they are interacting with the PDF can be quite a challenge,” said Kuznicki.
PDF components running inside of a browser are typically going to be more of a viewing type scenario, according to Holmann. “If you do modifications to the document, there might be more along the lines of adding digital signatures, which are usually handled on the server so the browser is really acting as a window into the true PDF functionality,” he said.
For a desktop scenario, users are usually dealing with a more editing type of scenario, Holmann explained. “You are really taking the documents, editing them, manipulating them, and saving them locally,” he said. “You really have no restrictions, and the PDF functions are really implemented on the machine instead of server.”
With mobile, developers face more technical challenges such as form factor and memory issues, making it hard to read a full-page PDF document on a phone with a small screen.
In order to really develop for each device, developers and businesses have to decide upfront what they are looking for the PDF to do in terms of functions, whether they just want a user to add a signature or have a full-featured PDF editor, according to Holmann.
“Define the functional requirements first, be aware that the limitations running inside a browser is going to be a lot more limited than running on a desktop app or even mobile app, then decide based on the functions, the limitations for each context, which way you want to go with the application,” he said.
In addition, developers should look for solutions that provide companion solutions to work on different platforms.
Evolving the format with the technology
As technology gets smarter, PDFs are turning into something that not only humans interpret and understand, but also something machines to read as well.
“With Big Data becoming more and more important, how do we unlock the information that is in PDFs for a world where it is not just humans reading documents anymore?” said Datalogics’ Kuznicki. “It is also computers and programs that are reading documents and using these to gather information to perform actions and opinions based on them.”
Another area that the ISO is looking to clarify in PDF 2.0 is the ability to put information in PDFs about what the various things on a page actually mean. “For example, if you have a table on a page of a PDF, then you and I as humans see and understand it is a table, and it has headings and information,” said Kuznicki. “But it is very hard for a machine to see and recognize that.”
A way around this is putting information into the PDF that ties these things together and explains that there is a table with headings and rows, as well as the information it contains.
“If a machine can’t understand a table as a collective set of data that has various meanings that are defined by, say, the headers in the table for example, then that machine and that process really has no way to understand that information or do anything useful with it,” said Kuznicki.
If a document doesn’t have this information added from the beginning when it is created, it can be a daunting task to add afterwards. In order to add this type of functionality to existing documents, developers need to turn to tools that automate the process and turn the human element in understanding what’s in a document into a programmatic form, according to Kuznicki.
Technology is also advancing PDFs with new and updated features such as AcroJS and Flash integration. “This provides content writers the possibility to create rich, dynamic, application-like interfaces inside a PDF document,” said LEAD Technologies’ Little. Moving forward, developers are going to have to be aware of more innovative PDFs.
Another problem developers have to deal with as technology evolves is having to create a solution that is still going to be useful 50 years from now. “Fifty years from now, are you going to be able to pull the document up and is it going to look the same?” said Holmann.
“With printed paper, you kind of have the guarantee of a physical object. But when it comes to displaying something electronic, you have to put some constraints in place to guarantee that.”
To address this, developers need to support past and current standards, but also prepare for future standards. “Developers need to be able to follow and possibly implement new extensions even before ISO adopts them in the standard,” according to Little. “The PDF format is a required feature of almost all applications that deal with documents. There are thousands of products and projects to add PDF to applications. Finding the component for current and future needs can require an extensive amount of due diligence.”
Developers need to utilize trusted source such as AIIM.org and the PDF Association in order to keep up with the ever-changing standard, and ensure the toolkit they choose is keeping up with the ISO committee, and updating as the specification changes, according to Little.
What should developers consider when it comes to picking out a PDF component solution?
Unlike other methodologies where tools are not the answer, it is important to pick the right toolkit in the PDF landscape because of the document’s extensive format. With all the different features and use cases that can be added to a PDF document, choosing the right PDF tool to fit into their application can be the biggest challenge for developers. Tools help to provide the PDF expertise, but with all the competing toolkits on the market, there are different functions and levels of quality in implementation and support developers have to choose from.
SD Times talked to some experts in the field to get an idea on what developers should consider when picking a PDF component solution:
Matt Kuznicki, chief technical officer at Datalogics
It is really about picking the right tool for the job, and understanding that the job can often times be larger than it appears at the outset, especially when your application needs to work with PDFs. Being able to take in PDFs from all sorts of sources really gives your application a competitive advantage over others, but it is something that can be a very challenging thing to do if you are not using the right toolkit.
Developers should look at the longevity of a toolkit: how long has it been developed, how actively is it developed, how proven is it out in the market, and what kind of support are they going to get with it.
When something goes wrong, they need to understand what went wrong. Often times it takes experts and a lot of detective work, and that’s not something that comes quickly or naturally. We find that toolkits benefit when there are a number of experts in the field who have put decades into understanding and working with PDFs.
I would look for comprehensiveness and how much of the standard a toolkit handles, what is the feedback on it, what is the community saying, and who has used the toolkit to drive actual solutions and solve actual problems.
Richard Little, president of LEAD Technologies
Look for a component and company that has been around for more than just a few years, actively develops the product, has a large user base, and offers great technical support. Having all of that, you can be sure that the component includes a rich feature set and is robust enough to pass some of the toughest QA departments out there. Young companies and free solutions have not had the time, experience or resources it takes to meet those qualifications.
Gerald Holmann, founder and president of Qoppa Software
There are a few things that are key: You have to start with how much of the PDF format is supported by the product. The PDF format is extensive and it uses thousands of features, so you want to be able to make sure that the component you look at will support a wide range of these features. I don’t think there is any company that can support all PDF features, but you want to look for wide feature cover of a PDF document.
You want to make sure the component supports all the different kinds of fonts, all the different kinds of images, all of the different types of encryption and whatnot.
After that, you want to look for stability, make sure that there is no issue with memory or crashing, and then after that you want to look at performance and make sure that it performs well with processing the documents. Processing a PDF document can be very CPU-intensive depending on the types of documents, so depending on the use case, this can be pretty critical when you deploy.
The PDF format is complex, so responsive and expert technical support is an important factor when choosing products that comes into play both at initial implementation of PDF features and also when resolving issues once an application is in production.
Datalogics: Datalogics provides best-of-breed PDF and e-book technologies for developers. The Adobe PDF Library SDK is a general purpose multi-platform API for manipulating PDF content. Datalogics’ PDF Java Toolkit is a pure Java API with robust support for PDF forms and digital signatures. PDF Alchemist SDK delivers intelligent extraction of text flows from PDFs into reflowable HTML and EPUB. Active Textbook is an end-to-end PDF and EPUB e-reading technology for the education market.
LEADTOOLS: LEADTOOLS Document Imaging toolkits include a full suite of PDF SDK technology for viewing, editing, creating and converting PDF and Office formats. The Document Viewer framework includes an advanced set of tools such as text searching, annotations, memory-efficient paging, inertial scrolling, and vector display. Developers can implement comprehensive PDF reading, writing and editing with support for the extraction of text, hyperlinks, bookmarks, digital signatures, PDF forms and metadata, as well as updating, splitting and merging pages from existing PDF documents.
ORPALIS: GdPicture.NET by ORPALIS offers extended support of the PDF format for .NET (C# and VB.NET) and non-managed applications written in VB6, Delphi, MS Access and more. Its numerous features include full Unicode support, PDF/A generation, digital signature support, PDF merging and splitting, PDF modification, PDF rasterization, and PDF creation with interactive form fields. With GdPicture.NET, you can also repair corrupted PDFs, add or extract fonts, and draw barcodes and annotations on documents.
Qoppa: Qoppa Software offers an extensive suite of PDF libraries and visual components that cover all PDF processing needs. PDF functions include creation and modification, assembly, conversion to images and HTML, automated printing, encryption and digital signatures, form fields, viewing and markup, optimization, and a lot more. Qoppa products provide the highest level of performance and reliability and are 100% Java, so they run on all servers and desktop operating systems.
Accusoft: Accusoft’s PDF Xpress empowers developers to boost application functionality with easy PDF creation, editing, and the highest level of PDF compression available on the market—as much as 90%. Quickly compress one PDF (or an entire library) with just one parameter change to boost display and transmission speed while dramatically reducing archival footprint. Leverage PDF Xpress to build a PDF portfolio of multiple documents and document types.
ActivePDF: Over 14 years, ActivePDF has developed and refined a comprehensive collection of PDF automation tools that make development easy. ActivePDF helps avoid delays, downtime and headaches. More than 23,000 satisfied customers have chosen ActivePDF, from startups to Fortune 100 companies.
Adobe: A company defined by its market-leading PDF technology, Adobe offers Adobe Document Cloud for document management across mobile devices and PCs. The Document Cloud features the Adobe Acrobat DC PDF solution, which provides a touch interface for document management through native mobile apps. In addition, the company offers Adobe ExportPDF and Adobe PDF Pack for converting and combining PDF files.
Amyuni: Amyuni provides developers and system administrators with high-performance PDF conversion and processing tools. Certified for Windows desktops and servers, Amyuni PDF Converter enables developers to easily integrate powerful PDF and PDF/A functionality into their applications with just a few lines of code. Amyuni PDF Creator produces optimized PDF documents and seamlessly integrates with COM, .NET, WinRT and Windows Phone applications.
Aspose: Aspose creates file format APIs that help .NET and Java developers work with documents. Aspose.Pdf for .NET and Aspose.Pdf for Java are APIs for creating, editing and converting PDF files. They support a wide range of features, from simple PDF file creation, through layout and formatting changes, to more complex operations like managing PDF forms, security and signatures. Aspose’s PDF APIs help speed up development with comprehensive functionality, free trials and responsive support.
CeTe: CeTe Software’s DynamicPDF product line, including Merger, Generator, Viewer, Rasterizer, PrintManager and Converter, provides developers access to a complete integrated PDF solution. Functionality includes PDF creation and manipulation, PDF conversion (to and from PDF), PDF printing, as well as an embeddable PDF Viewer. The DynamicPDF libraries and components have functionality for .NET (C# and VB.NET), Java and COM/ActiveX.
ComponentPro: Ultimate PDF for .NET is a 100%-managed PDF document component that helps you add PDF capabilities in .NET applications. With a few lines of code, developers can create a complex PDF document from scratch, or load an existing PDF file without using any third-party libraries or ActiveX controls. The Ultimate PDF component also offers many features, including drawing text, image, tables and other shapes; compression; hyperlinks; security; and custom fonts. PDF files created using the Ultimate PDF component are compatible with all versions of Adobe Acrobat as well, as is the free version of Acrobat Viewer from Adobe.
Glyph & Cog: Glyph & Cog offers a full line of software components designed to help developers add PDF capabilities into their applications. Functionality includes PDF viewing (Qt and ActiveX), printing, text extraction, and more with cross-platform support for Windows, Mac and Linux. Glyph & Cog’s newest product is PDFdeconstruct, a tool that decomposes PDF content into an XML file.
GrapeCity: Within the ComponentOne Studio product, GrapeCity provides UI controls for application development. Its offering includes PDF controls for creating and viewing PDF documents in Windows, Web and Windows Store applications without requiring users to install Adobe Acrobat. With the ComponentOne PDF control for WinForms, Silverlight, WPF, ASP.NET, MVC and WinRT users may generate and view full-featured reports with encryption, compression, outlining, hyperlinking, attachments, and everything else PDF users need.
Persits Software: Persits Software’s AspPDF and AspPDF.NET are feature-packed server components for managing Adobe PDF documents for ASP and .NET environments, respectively. Their simple and intuitive programming interface enables a Web application to perform many useful PDF-related functions, such as form fill-in, HTML-to-PDF, and PDF-to-image conversion, text extraction, stamping, digital signing, automatic printing, barcode generation, and many others, in just a few lines of script. Free fully functional 30-day evaluation versions are available.
TallComponents: TallComponents offers reliable and proven .NET class libraries to create, modify, convert, read, print and render PDF documents. The libraries are written entirely in C#, have no external dependencies such as Adobe Reader, and are characterized by an intuitive API combined with knowledgeable and fast support.
WebSupergoo: WebSupergoo’s ABCpdf .NET is designed for a combination of maximum power and ease of use, going directly to PDF for blazing speed. ABCpdf supports a wide range of input and output formats, from PDF, HMTL and XPS to SVG, EPS and DOCX. The product is fully color space-aware for importing, construction and rendering, and provides sophisticated operations for the analysis, deconstruction and reconstruction of documents.