Version: Version 1.1.0

Parameters

Pdftools OCR Service lets you fine-tune the OCR engine through parameters that influence performance, layout detection, output quality, and preprocessing steps. You can set these engine parameters as a sequence of key and value pairs separated by a semicolon (;) in the Conversion Service Configurator OCR settings.

General settings

`PredefinedProfile`

Key	Type	Default
`PredefinedProfile`	Name	`Default`

Selects a predefined recognition profile. Each profile provides optimized OCR settings for a specific use case.

Profile name	Purpose
`DataExtraction`	Captures all content from a document in a structured format, including tables, images, checkmarks, handwriting, and stamps.
`DocumentConversion_Accuracy`	Converts documents into editable formats with the highest quality. Detects font styles and fully reconstructs the logical document structure.
`DocumentConversion_Normal`	Converts documents into editable formats with faster processing. Detects font styles and reconstructs document structure, but skips orientation correction.
`DocumentArchiving_Accuracy`	Creates searchable PDF/PDF/A archives with maximum text coverage. Focuses on accuracy without reconstructing the full document structure.
`DocumentArchiving_Speed`	Creates searchable PDF/PDF/A archives with maximum throughput. Trades accuracy for faster processing.
`TextExtraction_Accuracy`	Extracts text from documents with high accuracy, including small and low-quality text areas. Doesn’t detect images or tables.
`TextExtraction_Speed`	Extracts text from documents with maximum throughput. Trades accuracy for faster processing.
`FieldLevelRecognition`	Recognizes short, isolated text fragments such as form fields or single lines.
`BarcodeRecognition_Accuracy`	Detects and reads barcodes with high accuracy. All other content types (text, images, tables) are ignored.
`BarcodeRecognition_Speed`	Detects and reads barcodes with maximum throughput. All other content types (text, images, tables) are ignored.
`HighCompressedImageOnlyPdf`	Produces highly compressed PDF files where each page is stored as an image. No text recognition is performed.
`BusinessCardsProcessing`	Recognizes and extracts structured data from business cards.
`MachineReadableZone`	Reads machine-readable zone (MRZ) data from identity documents. Extracts all text on the image and performs automatic resolution and geometry correction.
`EngineeringDrawingsProcessing`	Recognizes text in technical drawings, engineering diagrams, and schematics. Handles large images and multiple text orientations, producing searchable PDF output.
`Default`	General-purpose profile that uses default values for all processing parameters.

`DataExtraction`

Captures all content from a document in a structured format:

Identifies all object types: tables, images, checkmarks, handwriting, and stamps.
Uses accurate recognition mode for maximum quality.

This profile corresponds to the following parameters:¹

[PageAnalysisParams]
AnalysisMode = PAM_TextExtraction
SpeedQualityMode = SQM_Accurate
DetectBarcodes = true
DetectPictures = true
DetectTables = true
DetectHandwritten = true
DetectCheckmarks = true
DetectStamps = true
DetectTextOnPictures = true
DetectVerticalEuropeanText = true

[RecognizerParams]
Mode = RM_Accurate
TextTypes = TT_Normal | TT_Handwritten

[SynthesisParamsForDocument]
DetectDocumentStructure = true