Skip to main content

Configure OCR

The optical character recognition (OCR) technology identifies characters in images, scanned documents, and documents containing images with text. It adds a text layer containing recognized characters without visual changes to the original documents. The OCR can make all text in PDF documents extractable, regardless of how the text is included. Enable OCR in a Conversion Service Workflow to enhance the processed documents with information detected by an OCR engine.

The Conversion Service OCR can process files that already include an OCR layer.

info

The Conversion Service supports ABBYY FineReader Engine v11 and v12.

Install ABBYY

Prerequisites

To use OCR in the Conversion Service, you need:

  • A Windows Server machine to install the ABBYY FineReader OCR Engine.
  • An ABBYY OCR license issued by Pdftools. Use the Contact form if you require a license.
  • The ABBYY FineReader Engine installer from the Pdftools Customer portal.

Install the OCR Engine on the Windows Server machine using the Pdftools ABBYY FRE installer.

After the installation process, activate the ABBYY license you received from Pdftools:

  1. Open the ABBYY License Manager.
  2. Click on Activate License.
  3. Insert the ABBYY license key and run the activation process.

Enable and configure OCR

info

You can only activate OCR processing for the archive and conversion workflows.

The following steps explain how to activate and configure OCR in a profile:

  1. In the Conversion Service Configurator, go to Workflows & Profiles.

  2. Choose a profile where you want to activate the OCR, and then edit or duplicate this profile.

  3. Enable the OCR Settings processing toggle.

    The OCR Settings processing toggle enabled.

  4. Navigate to the now visible configuration section OCR Settings.

  5. In the OCR Settings section, click the Add Item button.

  6. Select the version of the ABBYY FineReader Engine you installed, and then click on Next.

  7. Confirm the values with Apply, or configure the OCR engine in more detail following the instructions in the following section.

Built-in documentation

For more information about the resulting page rendering configuration, such as the resolution, image processing, and text processing details, refer to the built-in documentation within the Conversion Service Configurator. To access built-in documentation, click the icon next to the OCR Settings configuration section header.

Conversion Service configurator built-in documentation

Configure ABBYY FineReader Engine

Change the predefined profile for ABBYY FineReader Engine, add a custom profile, or configure languages the OCR scans for:

  1. In the Conversion Service Configurator, go to Workflows & Profiles.
  2. Choose the profile where you activated the OCR.
  3. Navigate to the configuration section OCR Settings.
  4. In the OCR Settings section, click the Add Item button.
  5. Select the version of the ABBYY FineReader Engine you installed, and then click on Next.
  6. Change the custom profile, predefined profile, or language that OCR scans for in the window Step 2: Configure Selected Engine. See the sections below for more information about each of these procedures.

Configure the ABBYY FineReader OCR Engine with the Conversion Service Configurator

info

To change a language the OCR must identify, use the internal name from Predefined Languages in ABBYY FineReader Engine in the ABBYY FineReader documentation.

tip

See the following sections for more information about the configuration of the ABBYY FineReader PDF.

Predefined profile

The ABBYY predefined OCR engine profiles are designed for specific use cases.

Profile NameDescription
DefaultThe default ABBYY profile.
Document Conversion (Accuracy)Optimized for accuracy. Convert documents into editable formats.
Document Conversion (Speed)Optimized for speed. Convert documents into editable formats.
Text Extraction (Accuracy)Optimized for accuracy. Extract text from documents.
Text Extraction (Speed)Optimized for speed. Extract text from documents.

For more information, see the Predefined profiles in the ABBYY documentation.

tip

Select the standard profile or the most suitable profile for your use case. Adjust individual configuration values to your needs using a custom profile.

Custom profile

With a customized profile, you can adjust individual configuration options of the OCR Engine according to your requirements. Create your customized profile as an INI file and enter the path to this file in the custom profile field. The configuration values of the customized profile take precedence over the preset values of the selected predefined profile.

info

Pdftools integration of the ABBYY OCR Engine is custom. Some options (for example, PDFExportParams) do not affect the OCR result.

For more information about custom profiles, see User profiles in the ABBYY documentation.

Languages

Configure all languages that appear in your documents to improve recognition accuracy.

Configure all languages included in the documents

The ABBYY FineReader Engine only recognizes characters that are used in these languages. For example, if you only choose English, some special characters like German umlauts (äöü) are not identified correctly and can be identified as different letters (aou). Choose the languages used in the documents to avoid such mistakes.

See the Predefined Languages in ABBYY FineReader Engine in the ABBYY documentation.

Using the 3-Heights® OCR Service

A direct configuration of the ABBYY FineReader Engine (v11 or v12) is recommended. It is also possible to configure OCR through the 3-Heights® OCR Service, but it is not recommended. Use the 3-Heights® OCR Service only if your Conversion Service and a Document Converter installation use an ABBYY license simultaneously. For more information about the configuration of the 3-Heights® OCR Service, see Migrating from 3-Heights® Document Converter documentation.