Skip to main content
Version: Version 1.0.0

Engine parameters

The Pdftools OCR Service lets you fine-tune the OCR engine through various parameters influencing performance, layout detection, output quality, and preprocessing steps. You can set these engine parameters as a sequence of key and value pairs separated by a semicolon (;).

General settings

PredefinedProfile

KeyTypeDefault
PredefinedProfileNameDefault

The name of the predefined recognition profile. Predefined profiles offer optimized OCR settings for different goals. You can pass them using the PredefinedProfile parameter.

Profile namePurpose
DocumentConversion_AccuracyMaximizes accuracy in text recognition.
DocumentConversion_SpeedOptimizes for processing speed.
DocumentArchiving_AccuracyPreserves document structure and layout.
DocumentArchiving_SpeedFast archiving with basic layout.
BookArchiving_AccuracyHigh-quality capture for books.
BookArchiving_SpeedSpeeds up book digitization.
TextExtraction_AccuracyFor extracting clean text content.
TextExtraction_SpeedFast plain-text extraction.
FieldLevelRecognitionFine-tuned for form field recognition.
BarcodeRecognition_AccuracyHigh accuracy for barcode content.
BarcodeRecognition_SpeedFast barcode detection.
EngineeringDrawingsProcessingOptimized for linework and schematics.
DefaultGeneral-purpose balanced settings.

DocumentConversion_Accuracy

Convert documents into editable formats, optimized for accuracy. Enables font style detection.

This profile corresponds to the following parameter:1

[BarcodeParams]
EnableAdvancedExtractionMode = TRUE

DocumentConversion_Speed

For converting documents into editable formats, optimized for speed.

This profile corresponds to the following parameters:

[ObjectsExtractionParams]
ProhibitColorImage = TRUE

[PrepareImageMode]
UseFastBinarization = TRUE

[RecognizerParams]
FastMode = TRUE

DocumentArchiving_Accuracy

For creating an electronic archive, optimized for accuracy:

  • Enables detection of maximum text on an image, including text embedded into the image.
  • Skew correction is not performed.
  • Fonts and styles are not detected.

This profile corresponds to the following parameters:

[BarcodeParams]
EnableAdvancedExtractionMode = TRUE

[ObjectsExtractionParams]
DetectTextOnPictures = TRUE
EnableAggressiveTextExtraction = TRUE

[PageAnalysisParams]
EnableTextExtractionMode = TRUE

[SynthesisParamsForDocument]
DetectDocumentStructure = FALSE
DetectFontFormatting = FALSE

[SynthesisParamsForPage]
AllowGrayTextColor = TSPV_Yes
AllowGrayBackgroundColor = TSPV_Yes
DetectFontFormattingAtPageLevel = TRUE
DetectTextColor = TSPV_Yes

DocumentArchiving_Speed

This profile corresponds to the following parameters:

[ObjectsExtractionParams]
DetectMatrixPrinter = FALSE
DetectPorousText = FALSE
DetectTextOnPictures = TRUE
EnableAggressiveTextExtraction = TRUE
FastObjectsExtraction = TRUE
ProhibitColorImage = TRUE
RemoveGarbage = TRUE
RemoveTexture = FALSE

[PageAnalysisParams]
EnableTextExtractionMode = TRUE
ProhibitModelAnalysis = TRUE

[PrepareImageMode]
CorrectSkew = FALSE
UseFastBinarization = TRUE

[RecognizerParams]
FastMode = TRUE

[SynthesisParamsForDocument]
DetectDocumentStructure = FALSE
DetectFontFormatting = FALSE

[SynthesisParamsForPage]
AllowGrayBackgroundColor = TSPV_Yes
AllowGrayTextColor = TSPV_Yes
DetectFontFormattingAtPageLevel = TRUE
DetectTextColor = TSPV_Yes

BookArchiving_Accuracy

For creating an electronic library, optimized for accuracy:

  • High quality. Enables font style detection.

This profile corresponds to the following parameters:

[BarcodeParams]
EnableAdvancedExtractionMode = TRUE

BookArchiving_Speed

For creating an electronic library, optimized for speed:

This profile corresponds to the following parameters:

[ObjectsExtractionParams]
ProhibitColorImage = TRUE

[PrepareImageMode]
UseFastBinarization = TRUE

[RecognizerParams]
FastMode = TRUE

TextExtraction_Accuracy

For extracting text from documents, optimized for accuracy:

  • Enables detection of all text on an image, including small text areas of low quality (pictures and tables are not detected).
  • Fonts and styles are not detected.

This profile corresponds to the following parameters:

[BarcodeParams]
EnableAdvancedExtractionMode = TRUE

[ObjectsExtractionParams]
DetectTextOnPictures = TRUE
EnableAggressiveTextExtraction = TRUE

[PageAnalysisParams]
DetectPictures = FALSE
EnableTextExtractionMode = TRUE

[SynthesisParamsForDocument]
DetectDocumentStructure = FALSE
DetectFontFormatting = FALSE

[SynthesisParamsForPage]
DetectFontFormattingAtPageLevel = TRUE

TextExtraction_Speed

For extracting text from documents, optimized for speed:

This profile corresponds to the following parameters:

[ObjectsExtractionParams]
DetectMatrixPrinter = FALSE
DetectPorousText = FALSE
DetectTextOnPictures = TRUE
EnableAggressiveTextExtraction = TRUE
FastObjectsExtraction = TRUE
ProhibitColorImage = TRUE
RemoveGarbage = TRUE
RemoveTexture = FALSE

[PageAnalysisParams]
DetectPictures = FALSE
EnableTextExtractionMode = TRUE
ProhibitModelAnalysis = TRUE

[PrepareImageMode]
CorrectSkew = TRUE
DiscardColorImage = TRUE
UseFastBinarization = TRUE

[RecognizerParams]
FastMode = TRUE

[SynthesisParamsForDocument]
DetectDocumentStructure = FALSE
DetectFontFormatting = FALSE

[SynthesisParamsForPage]
DetectFontFormattingAtPageLevel = TRUE
DetectTextColor = TSPV_Yes

FieldLevelRecognition

For recognizing short text fragments.

This profile corresponds to the following parameters:

[DocumentProcessingParams]
PerformSynthesis = FALSE

[PageProcessingParams]
PerformAnalysis = FALSE

[SynthesisParamsForPage]
DetectFontFormattingAtPageLevel = FALSE

BarcodeRecognition_Accuracy

For barcode extraction, optimized for accuracy:

  • Extracts only barcodes (text, pictures, or tables are not detected).

This profile corresponds to the following parameters:

[BarcodeParams]
MinRatioToTextHeight = 0.9

[ObjectsExtractionParams]
DetectMatrixPrinter = FALSE
DetectPorousText = FALSE

[PageAnalysisParams]
DetectBarcodes = TRUE
DetectPictures = FALSE
DetectTables = FALSE
DetectText = FALSE
DetectSeparators = FALSE
DetectVectorGraphics = FALSE

[PrepareImageMode]
CorrectSkew = FALSE

BarcodeRecognition_Speed

For barcode extraction, optimized for speed:

This profile corresponds to the following parameters:

[BarcodeParams]
MinRatioToTextHeight = 0.9

[ObjectsExtractionParams]
DetectMatrixPrinter = FALSE
DetectPorousText = FALSE
FastObjectsExtraction = TRUE

[PageAnalysisParams]
DetectBarcodes = TRUE
DetectPictures = FALSE
DetectTables = FALSE
DetectText = FALSE
DetectSeparators = FALSE
DetectVectorGraphics = FALSE

[PageProcessingParams]
PerformPreprocessing = FALSE

[PrepareImageMode]
CorrectSkew = FALSE
DiscardColorImage = TRUE

EngineeringDrawingsProcessing

For recognizing technical drawings:

  • It takes into account large size and complexity of engineering diagrams, as well as the possibility of different text orientation within the image.
  • Enables detection of all text on an image, including text blocks of vertical orientation.

This profile corresponds to the following parameters:

[PageAnalysisParams]
DetectPictures = FALSE
DetectVectorGraphics = FALSE
DetectVerticalEuropeanText = TRUE
EnableTextExtractionMode = TRUE

[SynthesisParamsForDocument]
DetectDocumentStructure = FALSE
DetectFontFormatting = FALSE

[SynthesisParamsForPage]
DetectFontFormattingAtPageLevel = TRUE

Default

For default values:

  • Sets all the processing parameters to the default values.

Profile

KeyType
ProfilePath

Path to a custom recognition profile .ini file.

Use the Profile parameter to reference a custom OCR profile that defines specific processing behavior. The file must be on the same machine as the Pdftools OCR Service Manager node. If both Profile and PredefinedProfile are set, the custom Profile overrides the predefined profile. The following snippet represents an example path to the custom profile .ini file in the Profile parameter:

Profile="C:\ocr\profiles\document_conversion_high_accuracy.ini"
Custom Profiles

For more details about creating your profiles, review Custom Profiles page.


PreprocessingOnly

KeyTypeDefault
PreprocessingOnlyBooleanfalse

If you enable the PreprocessingOnly, only image transformations (for example, deskewing, resolution correction, binarization) are applied, and no recognition is performed. This is useful for workflows that require cleaned-up images without OCR. The PreprocessingOnly parameter takes Boolean values true or false.


RemoveGarbage

KeyType
RemoveGarbageInteger

Remove small, isolated dark regions in bitonal images that are likely scanning noise before any OCR is done. The value defines the maximum area of such noise in pixels. A value of -1 enables automatic determination.


Blank page detection

RecognizeBlankPages

KeyTypeDefault
RecognizeBlankPagesBooleanfalse

Enable automatic skipping of pages that are considered blank. A blank page is a page with uniform coloring and only slight noise. Colored, grayscale, and bitonal pages can be subject to blank page recognition. If a page is skipped as blank, no OCR is performed. The RecognizeBlankPages parameter takes Boolean values true or false.


BlankPageMargin

KeyTypeDefault
BlankPageMarginDouble0.02

Set the ratio that the margin takes with respect to the corresponding page length. The margin is excluded from the analysis when a page is blank, preventing border artifacts from affecting blank page detection. Allowed values range from 0.0 to 0.5. This parameter is only active if the value of RecognizeBlankPages is set to true.


Output format control

DisableMaskEmbedding

KeyTypeDefault
DisableMaskEmbeddingBooleanfalse

If this option is set to true, no mask is embedded in the output TIFF. When set to false (default), bitonal masks are embedded in the output TIFF or PDF as an image layer. The mask layer is omitted when enabled, and only recognized text is preserved. The DisableMaskEmbedding is useful for output without background images. The DisableMaskEmbedding parameter takes Boolean values true or false.

Footnotes

  1. All occurences of sentence “profile corresponds to the following parameters” reference a custom profile .ini file with an equivalent configuration to a predefined profile. For more information, review Profile, and Custom Profiles.