Skip to main content
Version: Version 1.0.0

Set up the Pdftools OCR Service with the Conversion Service

Learn how to enable and configure the Pdftools OCR Service in the Conversion Service. This page provides two main sections:

Enable the Pdftools OCR Service in the Conversion Service

Before you start

The following steps explain how to enable and configure OCR in the Conversion Service profile:

  1. In the Conversion Service Configurator, go to Workflows & Profiles.
  2. Click the pen icon next to the workflow profile you want to edit. The OCR is available in Archive and Conversion workflows.
  3. Enable the OCR Settings toggle.
    Integration tab of the Conversion Service Configurator
  4. Navigate to the now displayed OCR Settings configuration section.
  5. In the OCR Settings section, click the Add Item button.
  6. Select Pdftools OCR Service (3H Legacy Compatible) as the OCR engine, and then click Next.
    Integration tab of the Conversion Service Configurator
  7. Optional: If the OCR service is on a different server, update the Service URL.
    Integration tab of the Conversion Service Configurator
  8. Click Apply.

The Pdftools OCR Service must be configured and accessible through HTTP. You can configure Pdftools OCR Services to distribute the OCR processing equally. For more details, review Scale the Pdftools OCR Service.

Configure OCR in the Conversion Service

You can configure parameters such as languages that the OCR identifies in the documents, predefined profiles, accuracy of text extraction, and many more. To edit the configuration:

  1. In the Conversion Service Configurator, go to Workflows & Profiles.
  2. Click the pen icon next to the workflow profile you want to edit.
  3. Navigate to the OCR Settings section.
  4. Next to Engine, click the pen icon in the Pdftools OCR Service (3H Legacy Compatible) section.
    Integration tab of the Conversion Service Configurator
  5. After editing your configuration, click Apply.

You can edit parameters in the Parameters and Languages input fields:

  • In the Parameters input field, the key-value pairs are joined by an equal sign and separated by semicolons (;). For more information about available parameters, review Engine parameters.
    • The default parameter set in the Conversion Service Configurator:
      PredefinedProfile=DocumentConversion_Accuracy
    • An example of more parameters set in the Conversion Service Configurator:
      PredefinedProfile=DocumentArchiving_Accuracy;PreprocessingOnly=false;RemoveGarbage=0;RecognizeBlankPages=false;BlankPageMargin=0.02;DisableMaskEmbedding=false
  • In the Languages input field, you can set the languages that the OCR recognizes as one comma-separated string. For more information about available language recognition options, review Supported languages. The following code snippet shows an example of the comma-separated natural and technical languages in a string:
    English,German,French,Tagalog,Japanese,ChinesePRC,ChineseTaiwan,Corsican,Spanish,Chemistry,Java
References

For more information about specific configuration options, review Pdftools OCR Service references