Skip to main content
Version: Version 1.0.0

Parameters

The Pdftools OCR Service lets you fine-tune the OCR engine through various parameters influencing performance, layout detection, output quality, and preprocessing steps. You can set these engine parameters as a sequence of key and value pairs separated by a semicolon (;) in the Conversion Service Configurator OCR settings.

General settings

PredefinedProfile

KeyTypeDefault
PredefinedProfileNameDefault

The name of the predefined recognition profile. Predefined profiles offer optimized OCR settings for different goals. You can pass them using the PredefinedProfile parameter.

Profile namePurpose
DocumentConversion_AccuracySuitable for converting documents into an editable format. Maximizes accuracy in text recognition.
DocumentConversion_SpeedSuitable for converting documents into an editable format. Optimizes for processing speed.
DocumentArchiving_AccuracySuitable for creating an electronic archive (converting to PDF and PDF/A). Preserves document structure and layout.
DocumentArchiving_SpeedSuitable for creating an electronic archive (converting to PDF and PDF/A). Fast archiving with basic layout.
BookArchiving_AccuracySuitable for high-quality capture processing of books, magazines, newspapers to create an electronic library (converting to PDF and PDF/A). Use it to, for example, digitize paper book collections.
BookArchiving_SpeedSuitable for fast capture processing of books, magazines, newspapers to create an electronic library (converting to PDF and PDF/A). Use it to, for example, digitize paper book collections.
TextExtraction_AccuracySuitable for extracting text from a document. Optimized for high accuracy.
TextExtraction_SpeedSuitable for extracting text from a document. Optimized for speed.
FieldLevelRecognitionSuitable for recognizing short text fragments.
BarcodeRecognition_AccuracySuitable for barcode extraction. Extracts only barcodes (texts, pictures, or tables are not detected). Optimized for high accuracy.
BarcodeRecognition_SpeedSuitable for barcode extraction. Extracts only barcodes (texts, pictures, or tables are not detected). Optimied for speed.
EngineeringDrawingsProcessingSuitable for recognizing technical drawings (such as linework, complex engineering diagrams, and schematics). The profile is intended for converting such images into searchable PDF format.
DefaultGeneral-purpose balanced settings. Sets all the processing parameters to the default values.

DocumentConversion_Accuracy

Convert documents into editable formats, optimized for accuracy. Enables font style detection.

This profile corresponds to the following parameter:1

[BarcodeParams]
EnableAdvancedExtractionMode = true

DocumentConversion_Speed

For converting documents into editable formats, optimized for speed.

This profile corresponds to the following parameters:

[ObjectsExtractionParams]
ProhibitColorImage = true

[PrepareImageMode]
UseFastBinarization = true

[RecognizerParams]
FastMode = true

DocumentArchiving_Accuracy

For creating an electronic archive, optimized for accuracy:

  • Enables detection of maximum text on an image, including text embedded into the image.
  • Skew correction is not performed.
  • Fonts and styles are not detected.

This profile corresponds to the following parameters:

[BarcodeParams]
EnableAdvancedExtractionMode = true

[ObjectsExtractionParams]
DetectTextOnPictures = true
EnableAggressiveTextExtraction = true

[PageAnalysisParams]
EnableTextExtractionMode = true

[SynthesisParamsForDocument]
DetectDocumentStructure = false
DetectFontFormatting = false

[SynthesisParamsForPage]
AllowGrayTextColor = TSPV_Yes
AllowGrayBackgroundColor = TSPV_Yes
DetectFontFormattingAtPageLevel = true
DetectTextColor = TSPV_Yes

DocumentArchiving_Speed

This profile corresponds to the following parameters:

[ObjectsExtractionParams]
DetectMatrixPrinter = false
DetectPorousText = false
DetectTextOnPictures = true
EnableAggressiveTextExtraction = true
FastObjectsExtraction = true
ProhibitColorImage = true
RemoveGarbage = true
RemoveTexture = false

[PageAnalysisParams]
EnableTextExtractionMode = true
ProhibitModelAnalysis = true

[PrepareImageMode]
CorrectSkew = false
UseFastBinarization = true

[RecognizerParams]
FastMode = true

[SynthesisParamsForDocument]
DetectDocumentStructure = false
DetectFontFormatting = false

[SynthesisParamsForPage]
AllowGrayBackgroundColor = TSPV_Yes
AllowGrayTextColor = TSPV_Yes
DetectFontFormattingAtPageLevel = true
DetectTextColor = TSPV_Yes

BookArchiving_Accuracy

For creating an electronic library, optimized for accuracy:

  • High quality. Enables font style detection.

This profile corresponds to the following parameters:

[BarcodeParams]
EnableAdvancedExtractionMode = true

BookArchiving_Speed

For creating an electronic library, optimized for speed:

This profile corresponds to the following parameters:

[ObjectsExtractionParams]
ProhibitColorImage = true

[PrepareImageMode]
UseFastBinarization = true

[RecognizerParams]
FastMode = true

TextExtraction_Accuracy

For extracting text from documents, optimized for accuracy:

  • Enables detection of all text on an image, including small text areas of low quality (pictures and tables are not detected).
  • Fonts and styles are not detected.

This profile corresponds to the following parameters:

[BarcodeParams]
EnableAdvancedExtractionMode = true

[ObjectsExtractionParams]
DetectTextOnPictures = true
EnableAggressiveTextExtraction = true

[PageAnalysisParams]
DetectPictures = false
EnableTextExtractionMode = true

[SynthesisParamsForDocument]
DetectDocumentStructure = false
DetectFontFormatting = false

[SynthesisParamsForPage]
DetectFontFormattingAtPageLevel = true

TextExtraction_Speed

For extracting text from documents, optimized for speed:

This profile corresponds to the following parameters:

[ObjectsExtractionParams]
DetectMatrixPrinter = false
DetectPorousText = false
DetectTextOnPictures = true
EnableAggressiveTextExtraction = true
FastObjectsExtraction = true
ProhibitColorImage = true
RemoveGarbage = true
RemoveTexture = false

[PageAnalysisParams]
DetectPictures = false
EnableTextExtractionMode = true
ProhibitModelAnalysis = true

[PrepareImageMode]
CorrectSkew = true
DiscardColorImage = true
UseFastBinarization = true

[RecognizerParams]
FastMode = true

[SynthesisParamsForDocument]
DetectDocumentStructure = false
DetectFontFormatting = false

[SynthesisParamsForPage]
DetectFontFormattingAtPageLevel = true
DetectTextColor = TSPV_Yes

FieldLevelRecognition

For recognizing short text fragments.

This profile corresponds to the following parameters:

[DocumentProcessingParams]
PerformSynthesis = false

[PageProcessingParams]
PerformAnalysis = false

[SynthesisParamsForPage]
DetectFontFormattingAtPageLevel = false

BarcodeRecognition_Accuracy

For barcode extraction, optimized for accuracy:

  • Extracts only barcodes (text, pictures, or tables are not detected).

This profile corresponds to the following parameters:

[BarcodeParams]
MinRatioToTextHeight = 0.9

[ObjectsExtractionParams]
DetectMatrixPrinter = false
DetectPorousText = false

[PageAnalysisParams]
DetectBarcodes = true
DetectPictures = false
DetectTables = false
DetectText = false
DetectSeparators = false
DetectVectorGraphics = false

[PrepareImageMode]
CorrectSkew = false

BarcodeRecognition_Speed

For barcode extraction, optimized for speed:

This profile corresponds to the following parameters:

[BarcodeParams]
MinRatioToTextHeight = 0.9

[ObjectsExtractionParams]
DetectMatrixPrinter = false
DetectPorousText = false
FastObjectsExtraction = true

[PageAnalysisParams]
DetectBarcodes = true
DetectPictures = false
DetectTables = false
DetectText = false
DetectSeparators = false
DetectVectorGraphics = false

[PageProcessingParams]
PerformPreprocessing = false

[PrepareImageMode]
CorrectSkew = false
DiscardColorImage = true

EngineeringDrawingsProcessing

For recognizing technical drawings:

  • It takes into account large size and complexity of engineering diagrams, as well as the possibility of different text orientation within the image.
  • Enables detection of all text on an image, including text blocks of vertical orientation.

This profile corresponds to the following parameters:

[PageAnalysisParams]
DetectPictures = false
DetectVectorGraphics = false
DetectVerticalEuropeanText = true
EnableTextExtractionMode = true

[SynthesisParamsForDocument]
DetectDocumentStructure = false
DetectFontFormatting = false

[SynthesisParamsForPage]
DetectFontFormattingAtPageLevel = true

Default

For default values:

  • Sets all the processing parameters to the default values.

Profile

KeyType
ProfilePath

Path to a custom recognition profile INI file.

To configure a custom profile for specific OCR processing behavior, follow these steps:

  1. Create a custom profile INI file.
  2. In the Conversion Service Configurator, go to Workflows & Profiles.
  3. Click the pen icon next to the workflow profile you want to edit.
  4. Navigate to the OCR Settings section.
  5. Next to Engine, click the pen icon in the Pdftools OCR Service (3H Legacy Compatible) section.
    Integration tab of the Conversion Service Configurator
  6. In the Parameters input field, include a path to your INI file, for example:
    Profile="C:\ocr\profiles\document_conversion_high_accuracy.ini"
  7. After editing your configuration, click Apply.

The INI file must be on the same machine as the Pdftools OCR Service Manager node. If both Profile and PredefinedProfile are set, the custom Profile overrides the predefined profile.

Custom Profiles

For more details about creating your profiles, review Custom Profiles page.


PreprocessingOnly

KeyTypeDefault
PreprocessingOnlyBooleanfalse

If you enable the PreprocessingOnly, only image transformations (for example, deskewing, resolution correction, binarization) are applied, and no recognition is performed. This is useful for workflows that require cleaned-up images without OCR. The PreprocessingOnly parameter takes Boolean values true or false.


RemoveGarbage

KeyType
RemoveGarbageInteger

Remove small, isolated dark regions in bitonal images that are likely scanning noise before any OCR is done. The value defines the maximum area of such noise in pixels. A value of -1 enables automatic determination.


Blank page detection

RecognizeBlankPages

KeyTypeDefault
RecognizeBlankPagesBooleanfalse

Enable automatic skipping of pages that are considered blank. A blank page is a page with uniform coloring and only slight noise. Colored, grayscale, and bitonal pages can be subject to blank page recognition. If a page is skipped as blank, no OCR is performed. The RecognizeBlankPages parameter takes Boolean values true or false.


BlankPageMargin

KeyTypeDefault
BlankPageMarginDouble0.02

Set the ratio that the margin takes with respect to the corresponding page length. The margin is excluded from the analysis when a page is blank, preventing border artifacts from affecting blank page detection. Allowed values range from 0.0 to 0.5. This parameter is only active if the value of RecognizeBlankPages is set to true.


Output format control

DisableMaskEmbedding

KeyTypeDefault
DisableMaskEmbeddingBooleanfalse

If this option is set to true, no mask is embedded in the output TIFF. When set to false (default), bitonal masks are embedded in the output TIFF or PDF as an image layer. The mask layer is omitted when enabled, and only recognized text is preserved. The DisableMaskEmbedding is useful for output without background images. The DisableMaskEmbedding parameter takes Boolean values true or false.

Footnotes

  1. All occurences of sentence “profile corresponds to the following parameters” reference a custom profile INI file with an equivalent configuration to a predefined profile. For more information, review Profile, and Custom Profiles.