Engine parameters
The Pdftools OCR Service lets you fine-tune the OCR engine through various parameters influencing performance, layout detection, output quality, and preprocessing steps. You can set these engine parameters as a sequence of key and value pairs separated by a semicolon (;).
General settings
PredefinedProfile
Key | Type | Default |
---|---|---|
PredefinedProfile | Name | Default |
The name of the predefined recognition profile. Predefined profiles offer optimized OCR settings for different goals. You can pass them using the PredefinedProfile
parameter.
Profile name | Purpose |
---|---|
DocumentConversion_Accuracy | Maximizes accuracy in text recognition. |
DocumentConversion_Speed | Optimizes for processing speed. |
DocumentArchiving_Accuracy | Preserves document structure and layout. |
DocumentArchiving_Speed | Fast archiving with basic layout. |
BookArchiving_Accuracy | High-quality capture for books. |
BookArchiving_Speed | Speeds up book digitization. |
TextExtraction_Accuracy | For extracting clean text content. |
TextExtraction_Speed | Fast plain-text extraction. |
FieldLevelRecognition | Fine-tuned for form field recognition. |
BarcodeRecognition_Accuracy | High accuracy for barcode content. |
BarcodeRecognition_Speed | Fast barcode detection. |
EngineeringDrawingsProcessing | Optimized for linework and schematics. |
Default | General-purpose balanced settings. |
DocumentConversion_Accuracy
Convert documents into editable formats, optimized for accuracy. Enables font style detection.
This profile corresponds to the following parameter:1
[BarcodeParams]
EnableAdvancedExtractionMode = TRUE
DocumentConversion_Speed
For converting documents into editable formats, optimized for speed.
- Similar to
DocumentConversion_Accuracy
, but document analysis and recognition are faster.
This profile corresponds to the following parameters:
[ObjectsExtractionParams]
ProhibitColorImage = TRUE
[PrepareImageMode]
UseFastBinarization = TRUE
[RecognizerParams]
FastMode = TRUE
DocumentArchiving_Accuracy
For creating an electronic archive, optimized for accuracy:
- Enables detection of maximum text on an image, including text embedded into the image.
- Skew correction is not performed.
- Fonts and styles are not detected.
This profile corresponds to the following parameters:
[BarcodeParams]
EnableAdvancedExtractionMode = TRUE
[ObjectsExtractionParams]
DetectTextOnPictures = TRUE
EnableAggressiveTextExtraction = TRUE
[PageAnalysisParams]
EnableTextExtractionMode = TRUE
[SynthesisParamsForDocument]
DetectDocumentStructure = FALSE
DetectFontFormatting = FALSE
[SynthesisParamsForPage]
AllowGrayTextColor = TSPV_Yes
AllowGrayBackgroundColor = TSPV_Yes
DetectFontFormattingAtPageLevel = TRUE
DetectTextColor = TSPV_Yes
DocumentArchiving_Speed
- Like
DocumentArchiving_Accuracy
, but document analysis and recognition are sped up.
This profile corresponds to the following parameters:
[ObjectsExtractionParams]
DetectMatrixPrinter = FALSE
DetectPorousText = FALSE
DetectTextOnPictures = TRUE
EnableAggressiveTextExtraction = TRUE
FastObjectsExtraction = TRUE
ProhibitColorImage = TRUE
RemoveGarbage = TRUE
RemoveTexture = FALSE
[PageAnalysisParams]
EnableTextExtractionMode = TRUE
ProhibitModelAnalysis = TRUE
[PrepareImageMode]
CorrectSkew = FALSE
UseFastBinarization = TRUE
[RecognizerParams]
FastMode = TRUE
[SynthesisParamsForDocument]
DetectDocumentStructure = FALSE
DetectFontFormatting = FALSE
[SynthesisParamsForPage]
AllowGrayBackgroundColor = TSPV_Yes
AllowGrayTextColor = TSPV_Yes
DetectFontFormattingAtPageLevel = TRUE
DetectTextColor = TSPV_Yes
BookArchiving_Accuracy
For creating an electronic library, optimized for accuracy:
- High quality. Enables font style detection.
This profile corresponds to the following parameters:
[BarcodeParams]
EnableAdvancedExtractionMode = TRUE
BookArchiving_Speed
For creating an electronic library, optimized for speed:
- Like
BookArchiving_Accuracy
, but document analysis and recognition are faster.
This profile corresponds to the following parameters:
[ObjectsExtractionParams]
ProhibitColorImage = TRUE
[PrepareImageMode]
UseFastBinarization = TRUE
[RecognizerParams]
FastMode = TRUE
TextExtraction_Accuracy
For extracting text from documents, optimized for accuracy:
- Enables detection of all text on an image, including small text areas of low quality (pictures and tables are not detected).
- Fonts and styles are not detected.
This profile corresponds to the following parameters:
[BarcodeParams]
EnableAdvancedExtractionMode = TRUE
[ObjectsExtractionParams]
DetectTextOnPictures = TRUE
EnableAggressiveTextExtraction = TRUE
[PageAnalysisParams]
DetectPictures = FALSE
EnableTextExtractionMode = TRUE
[SynthesisParamsForDocument]
DetectDocumentStructure = FALSE
DetectFontFormatting = FALSE
[SynthesisParamsForPage]
DetectFontFormattingAtPageLevel = TRUE
TextExtraction_Speed
For extracting text from documents, optimized for speed:
- Like
TextExtraction_Accuracy
, but document analysis and recognition are sped up.
This profile corresponds to the following parameters:
[ObjectsExtractionParams]
DetectMatrixPrinter = FALSE
DetectPorousText = FALSE
DetectTextOnPictures = TRUE
EnableAggressiveTextExtraction = TRUE
FastObjectsExtraction = TRUE
ProhibitColorImage = TRUE
RemoveGarbage = TRUE
RemoveTexture = FALSE
[PageAnalysisParams]
DetectPictures = FALSE
EnableTextExtractionMode = TRUE
ProhibitModelAnalysis = TRUE
[PrepareImageMode]
CorrectSkew = TRUE
DiscardColorImage = TRUE
UseFastBinarization = TRUE
[RecognizerParams]
FastMode = TRUE
[SynthesisParamsForDocument]
DetectDocumentStructure = FALSE
DetectFontFormatting = FALSE
[SynthesisParamsForPage]
DetectFontFormattingAtPageLevel = TRUE
DetectTextColor = TSPV_Yes
FieldLevelRecognition
For recognizing short text fragments.
This profile corresponds to the following parameters:
[DocumentProcessingParams]
PerformSynthesis = FALSE
[PageProcessingParams]
PerformAnalysis = FALSE
[SynthesisParamsForPage]
DetectFontFormattingAtPageLevel = FALSE
BarcodeRecognition_Accuracy
For barcode extraction, optimized for accuracy:
- Extracts only barcodes (text, pictures, or tables are not detected).
This profile corresponds to the following parameters:
[BarcodeParams]
MinRatioToTextHeight = 0.9
[ObjectsExtractionParams]
DetectMatrixPrinter = FALSE
DetectPorousText = FALSE
[PageAnalysisParams]
DetectBarcodes = TRUE
DetectPictures = FALSE
DetectTables = FALSE
DetectText = FALSE
DetectSeparators = FALSE
DetectVectorGraphics = FALSE
[PrepareImageMode]
CorrectSkew = FALSE
BarcodeRecognition_Speed
For barcode extraction, optimized for speed:
- Like
BarcodeRecognition_Accuracy
, but document analysis and recognition are sped up.
This profile corresponds to the following parameters:
[BarcodeParams]
MinRatioToTextHeight = 0.9
[ObjectsExtractionParams]
DetectMatrixPrinter = FALSE
DetectPorousText = FALSE
FastObjectsExtraction = TRUE
[PageAnalysisParams]
DetectBarcodes = TRUE
DetectPictures = FALSE
DetectTables = FALSE
DetectText = FALSE
DetectSeparators = FALSE
DetectVectorGraphics = FALSE
[PageProcessingParams]
PerformPreprocessing = FALSE
[PrepareImageMode]
CorrectSkew = FALSE
DiscardColorImage = TRUE
EngineeringDrawingsProcessing
For recognizing technical drawings:
- It takes into account large size and complexity of engineering diagrams, as well as the possibility of different text orientation within the image.
- Enables detection of all text on an image, including text blocks of vertical orientation.
This profile corresponds to the following parameters:
[PageAnalysisParams]
DetectPictures = FALSE
DetectVectorGraphics = FALSE
DetectVerticalEuropeanText = TRUE
EnableTextExtractionMode = TRUE
[SynthesisParamsForDocument]
DetectDocumentStructure = FALSE
DetectFontFormatting = FALSE
[SynthesisParamsForPage]
DetectFontFormattingAtPageLevel = TRUE
Default
For default values:
- Sets all the processing parameters to the default values.
Profile
Key | Type |
---|---|
Profile | Path |
Path to a custom recognition profile .ini
file.
Use the Profile
parameter to reference a custom OCR profile that defines specific processing behavior. The file must be on the same machine as the Pdftools OCR Service Manager node. If both Profile
and PredefinedProfile
are set, the custom Profile
overrides the predefined profile. The following snippet represents an example path to the custom profile .ini
file in the Profile
parameter:
Profile="C:\ocr\profiles\document_conversion_high_accuracy.ini"
For more details about creating your profiles, review Custom Profiles page.
PreprocessingOnly
Key | Type | Default |
---|---|---|
PreprocessingOnly | Boolean | false |
If you enable the PreprocessingOnly
, only image transformations (for example, deskewing, resolution correction, binarization) are applied, and no recognition is performed. This is useful for workflows that require cleaned-up images without OCR. The PreprocessingOnly
parameter takes Boolean values true
or false
.
RemoveGarbage
Key | Type |
---|---|
RemoveGarbage | Integer |
Remove small, isolated dark regions in bitonal images that are likely scanning noise before any OCR is done. The value defines the maximum area of such noise in pixels. A value of -1
enables automatic determination.
Blank page detection
RecognizeBlankPages
Key | Type | Default |
---|---|---|
RecognizeBlankPages | Boolean | false |
Enable automatic skipping of pages that are considered blank. A blank page is a page with uniform coloring and only slight noise. Colored, grayscale, and bitonal pages can be subject to blank page recognition. If a page is skipped as blank, no OCR is performed. The RecognizeBlankPages
parameter takes Boolean values true
or false
.
BlankPageMargin
Key | Type | Default |
---|---|---|
BlankPageMargin | Double | 0.02 |
Set the ratio that the margin takes with respect to the corresponding page length. The margin is excluded from the analysis when a page is blank, preventing border artifacts from affecting blank page detection. Allowed values range from 0.0
to 0.5
. This parameter is only active if the value of RecognizeBlankPages is set to true
.
Output format control
DisableMaskEmbedding
Key | Type | Default |
---|---|---|
DisableMaskEmbedding | Boolean | false |
If this option is set to true
, no mask is embedded in the output TIFF. When set to false
(default), bitonal masks are embedded in the output TIFF or PDF as an image layer. The mask layer is omitted when enabled, and only recognized text is preserved. The DisableMaskEmbedding
is useful for output without background images. The DisableMaskEmbedding
parameter takes Boolean values true
or false
.
Footnotes
-
All occurences of sentence “profile corresponds to the following parameters” reference a custom profile
.ini
file with an equivalent configuration to a predefined profile. For more information, reviewProfile
, and Custom Profiles. ↩