Set up the Pdftools OCR Service with the Conversion Service
Learn how to enable and configure the Pdftools OCR Service in the Conversion Service. This page provides two main sections:
Enable the Pdftools OCR Service in the Conversion Service
- Install and configure the Conversion Service. For detailed installation instructions, review Conversion Service Getting started.
- Install the Pdftools OCR Service. For more information, review Pdftools OCR getting started.
The following steps explain how to enable and configure OCR in the Conversion Service profile:
- In the Conversion Service Configurator, go to Workflows & Profiles.
- Click the pen icon next to the workflow profile you want to edit. The OCR is available in Archive and Conversion workflows.
- Enable the OCR Settings toggle.
- Navigate to the now displayed OCR Settings configuration section.
- In the OCR Settings section, click the Add Item button.
- Select Pdftools OCR Service (3H Legacy Compatible) as the OCR engine, and then click Next.
- Optional: If the OCR service is on a different server, update the Service URL.
- Click Apply.
The Pdftools OCR Service must be configured and accessible through HTTP. You can configure Pdftools OCR Services to distribute the OCR processing equally. For more details, review Scale the Pdftools OCR Service.
Configure OCR in the Conversion Service
You can configure parameters such as languages that the OCR identifies in the documents, predefined profiles, accuracy of text extraction, and many more. To edit the configuration:
- In the Conversion Service Configurator, go to Workflows & Profiles.
- Click the pen icon next to the workflow profile you want to edit.
- Navigate to the OCR Settings section.
- Next to Engine, click the pen icon in the Pdftools OCR Service (3H Legacy Compatible) section.
- After editing your configuration, click Apply.
You can edit parameters in the Parameters and Languages input fields:
- In the Parameters input field, the key-value pairs are joined by an equal sign and separated by semicolons (
;
). For more information about available parameters, review Engine parameters.- The default parameter set in the Conversion Service Configurator:
PredefinedProfile=DocumentConversion_Accuracy
- An example of more parameters set in the Conversion Service Configurator:
PredefinedProfile=DocumentArchiving_Accuracy;PreprocessingOnly=false;RemoveGarbage=0;RecognizeBlankPages=false;BlankPageMargin=0.02;DisableMaskEmbedding=false
- The default parameter set in the Conversion Service Configurator:
- In the Languages input field, you can set the languages that the OCR recognizes as one comma-separated string. For more information about available language recognition options, review Supported languages. The following code snippet shows an example of the comma-separated natural and technical languages in a string:
English,German,French,Tagalog,Japanese,ChinesePRC,ChineseTaiwan,Corsican,Spanish,Chemistry,Java
For more information about specific configuration options, review Pdftools OCR Service references