ABBYY is introducing a new optical character recognition (OCR) API to enable developers to extract data from unstructured documents.

“As a vanguard of OCR, ABBYY has long had a vibrant community of cutting-edge developers creating transformational solutions with our advanced document AI,” said Nick Hyatt, vice president of Engineering R&D at ABBYY. “ABBYY Document AI API is a major step forward for developing automated document workflows.”

The ABBYY Document AI API—currently in technical preview—will allow developers to transform unstructured data into structured JSON in just a few lines of code. It includes SDKs for Python, C#, JavaScript, and Java. 

Some examples of documents that data can be converted from include invoices, receipts, and tax forms. 

During this technical preview, the OCR models are only available as pre-trained models, with no options for custom training or fine-tuning yet. The API will be free to use during the preview, but there is a processing volume limit of 1000 pages. 

It currently supports OCR in English, German, French, Spanish, Dutch, Japanese, and both traditional and simplified Chinese. For handwriting recognition, or ICR, it supports English, German, French, Spanish, and Japanese. 

Developers can join the technical preview here.