Parameters
The Pdftools OCR Service lets you fine-tune the OCR engine through various parameters influencing performance, layout detection, output quality, and preprocessing steps. You can set these engine parameters as a sequence of key and value pairs separated by a semicolon (;) in the Conversion Service Configurator OCR settings.
General settings
PredefinedProfile
Key | Type | Default |
---|---|---|
PredefinedProfile | Name | Default |
The name of the predefined recognition profile. Predefined profiles offer optimized OCR settings for different goals. You can pass them using the PredefinedProfile
parameter.
Profile name | Purpose |
---|---|
DocumentConversion_Accuracy | Suitable for converting documents into an editable format. Maximizes accuracy in text recognition. |
DocumentConversion_Speed | Suitable for converting documents into an editable format. Optimizes for processing speed. |
DocumentArchiving_Accuracy | Suitable for creating an electronic archive (converting to PDF and PDF/A). Preserves document structure and layout. |
DocumentArchiving_Speed | Suitable for creating an electronic archive (converting to PDF and PDF/A). Fast archiving with basic layout. |
BookArchiving_Accuracy | Suitable for high-quality capture processing of books, magazines, newspapers to create an electronic library (converting to PDF and PDF/A). Use it to, for example, digitize paper book collections. |
BookArchiving_Speed | Suitable for fast capture processing of books, magazines, newspapers to create an electronic library (converting to PDF and PDF/A). Use it to, for example, digitize paper book collections. |
TextExtraction_Accuracy | Suitable for extracting text from a document. Optimized for high accuracy. |
TextExtraction_Speed | Suitable for extracting text from a document. Optimized for speed. |
FieldLevelRecognition | Suitable for recognizing short text fragments. |
BarcodeRecognition_Accuracy | Suitable for barcode extraction. Extracts only barcodes (texts, pictures, or tables are not detected). Optimized for high accuracy. |
BarcodeRecognition_Speed | Suitable for barcode extraction. Extracts only barcodes (texts, pictures, or tables are not detected). Optimied for speed. |
EngineeringDrawingsProcessing | Suitable for recognizing technical drawings (such as linework, complex engineering diagrams, and schematics). The profile is intended for converting such images into searchable PDF format. |
Default | General-purpose balanced settings. Sets all the processing parameters to the default values. |
DocumentConversion_Accuracy
Convert documents into editable formats, optimized for accuracy. Enables font style detection.
This profile corresponds to the following parameter:1
[BarcodeParams]
EnableAdvancedExtractionMode = true
DocumentConversion_Speed
For converting documents into editable formats, optimized for speed.
- Similar to
DocumentConversion_Accuracy
, but document analysis and recognition are faster.
This profile corresponds to the following parameters:
[ObjectsExtractionParams]
ProhibitColorImage = true
[PrepareImageMode]
UseFastBinarization = true
[RecognizerParams]
FastMode = true
DocumentArchiving_Accuracy
For creating an electronic archive, optimized for accuracy:
- Enables detection of maximum text on an image, including text embedded into the image.
- Skew correction is not performed.
- Fonts and styles are not detected.
This profile corresponds to the following parameters:
[BarcodeParams]
EnableAdvancedExtractionMode = true
[ObjectsExtractionParams]
DetectTextOnPictures = true
EnableAggressiveTextExtraction = true
[PageAnalysisParams]
EnableTextExtractionMode = true
[SynthesisParamsForDocument]
DetectDocumentStructure = false
DetectFontFormatting = false
[SynthesisParamsForPage]
AllowGrayTextColor = TSPV_Yes
AllowGrayBackgroundColor = TSPV_Yes
DetectFontFormattingAtPageLevel = true
DetectTextColor = TSPV_Yes
DocumentArchiving_Speed
- Like
DocumentArchiving_Accuracy
, but document analysis and recognition are sped up.
This profile corresponds to the following parameters:
[ObjectsExtractionParams]
DetectMatrixPrinter = false
DetectPorousText = false
DetectTextOnPictures = true
EnableAggressiveTextExtraction = true
FastObjectsExtraction = true
ProhibitColorImage = true
RemoveGarbage = true
RemoveTexture = false
[PageAnalysisParams]
EnableTextExtractionMode = true
ProhibitModelAnalysis = true
[PrepareImageMode]
CorrectSkew = false
UseFastBinarization = true
[RecognizerParams]
FastMode = true
[SynthesisParamsForDocument]
DetectDocumentStructure = false
DetectFontFormatting = false
[SynthesisParamsForPage]
AllowGrayBackgroundColor = TSPV_Yes
AllowGrayTextColor = TSPV_Yes
DetectFontFormattingAtPageLevel = true
DetectTextColor = TSPV_Yes
BookArchiving_Accuracy
For creating an electronic library, optimized for accuracy:
- High quality. Enables font style detection.
This profile corresponds to the following parameters:
[BarcodeParams]
EnableAdvancedExtractionMode = true
BookArchiving_Speed
For creating an electronic library, optimized for speed:
- Like
BookArchiving_Accuracy
, but document analysis and recognition are faster.
This profile corresponds to the following parameters:
[ObjectsExtractionParams]
ProhibitColorImage = true
[PrepareImageMode]
UseFastBinarization = true
[RecognizerParams]
FastMode = true
TextExtraction_Accuracy
For extracting text from documents, optimized for accuracy:
- Enables detection of all text on an image, including small text areas of low quality (pictures and tables are not detected).
- Fonts and styles are not detected.
This profile corresponds to the following parameters:
[BarcodeParams]
EnableAdvancedExtractionMode = true
[ObjectsExtractionParams]
DetectTextOnPictures = true
EnableAggressiveTextExtraction = true
[PageAnalysisParams]
DetectPictures = false
EnableTextExtractionMode = true
[SynthesisParamsForDocument]
DetectDocumentStructure = false
DetectFontFormatting = false
[SynthesisParamsForPage]
DetectFontFormattingAtPageLevel = true
TextExtraction_Speed
For extracting text from documents, optimized for speed:
- Like
TextExtraction_Accuracy
, but document analysis and recognition are faster.
This profile corresponds to the following parameters:
[ObjectsExtractionParams]
DetectMatrixPrinter = false
DetectPorousText = false
DetectTextOnPictures = true
EnableAggressiveTextExtraction = true
FastObjectsExtraction = true
ProhibitColorImage = true
RemoveGarbage = true
RemoveTexture = false
[PageAnalysisParams]
DetectPictures = false
EnableTextExtractionMode = true
ProhibitModelAnalysis = true
[PrepareImageMode]
CorrectSkew = true
DiscardColorImage = true
UseFastBinarization = true
[RecognizerParams]
FastMode = true
[SynthesisParamsForDocument]
DetectDocumentStructure = false
DetectFontFormatting = false
[SynthesisParamsForPage]
DetectFontFormattingAtPageLevel = true
DetectTextColor = TSPV_Yes
FieldLevelRecognition
For recognizing short text fragments.
This profile corresponds to the following parameters:
[DocumentProcessingParams]
PerformSynthesis = false
[PageProcessingParams]
PerformAnalysis = false
[SynthesisParamsForPage]
DetectFontFormattingAtPageLevel = false
BarcodeRecognition_Accuracy
For barcode extraction, optimized for accuracy:
- Extracts only barcodes (text, pictures, or tables are not detected).
This profile corresponds to the following parameters:
[BarcodeParams]
MinRatioToTextHeight = 0.9
[ObjectsExtractionParams]
DetectMatrixPrinter = false
DetectPorousText = false
[PageAnalysisParams]
DetectBarcodes = true
DetectPictures = false
DetectTables = false
DetectText = false
DetectSeparators = false
DetectVectorGraphics = false
[PrepareImageMode]
CorrectSkew = false
BarcodeRecognition_Speed
For barcode extraction, optimized for speed:
- Like
BarcodeRecognition_Accuracy
, but document analysis and recognition are sped up.
This profile corresponds to the following parameters:
[BarcodeParams]
MinRatioToTextHeight = 0.9
[ObjectsExtractionParams]
DetectMatrixPrinter = false
DetectPorousText = false
FastObjectsExtraction = true
[PageAnalysisParams]
DetectBarcodes = true
DetectPictures = false
DetectTables = false
DetectText = false
DetectSeparators = false
DetectVectorGraphics = false
[PageProcessingParams]
PerformPreprocessing = false
[PrepareImageMode]
CorrectSkew = false
DiscardColorImage = true
EngineeringDrawingsProcessing
For recognizing technical drawings:
- It takes into account large size and complexity of engineering diagrams, as well as the possibility of different text orientation within the image.
- Enables detection of all text on an image, including text blocks of vertical orientation.
This profile corresponds to the following parameters:
[PageAnalysisParams]
DetectPictures = false
DetectVectorGraphics = false
DetectVerticalEuropeanText = true
EnableTextExtractionMode = true
[SynthesisParamsForDocument]
DetectDocumentStructure = false
DetectFontFormatting = false
[SynthesisParamsForPage]
DetectFontFormattingAtPageLevel = true
Default
For default values:
- Sets all the processing parameters to the default values.
Profile
Key | Type |
---|---|
Profile | Path |
Path to a custom recognition profile INI file.
To configure a custom profile for specific OCR processing behavior, follow these steps:
- Create a custom profile INI file.
- In the Conversion Service Configurator, go to Workflows & Profiles.
- Click the pen icon next to the workflow profile you want to edit.
- Navigate to the OCR Settings section.
- Next to Engine, click the pen icon in the Pdftools OCR Service (3H Legacy Compatible) section.
- In the Parameters input field, include a path to your INI file, for example:
Profile="C:\ocr\profiles\document_conversion_high_accuracy.ini"
- After editing your configuration, click Apply.
The INI file must be on the same machine as the Pdftools OCR Service Manager node. If both Profile
and PredefinedProfile
are set, the custom Profile
overrides the predefined profile.
For more details about creating your profiles, review Custom Profiles page.
PreprocessingOnly
Key | Type | Default |
---|---|---|
PreprocessingOnly | Boolean | false |
If you enable the PreprocessingOnly
, only image transformations (for example, deskewing, resolution correction, binarization) are applied, and no recognition is performed. This is useful for workflows that require cleaned-up images without OCR. The PreprocessingOnly
parameter takes Boolean values true
or false
.
RemoveGarbage
Key | Type |
---|---|
RemoveGarbage | Integer |
Remove small, isolated dark regions in bitonal images that are likely scanning noise before any OCR is done. The value defines the maximum area of such noise in pixels. A value of -1
enables automatic determination.
Blank page detection
RecognizeBlankPages
Key | Type | Default |
---|---|---|
RecognizeBlankPages | Boolean | false |
Enable automatic skipping of pages that are considered blank. A blank page is a page with uniform coloring and only slight noise. Colored, grayscale, and bitonal pages can be subject to blank page recognition. If a page is skipped as blank, no OCR is performed. The RecognizeBlankPages
parameter takes Boolean values true
or false
.
BlankPageMargin
Key | Type | Default |
---|---|---|
BlankPageMargin | Double | 0.02 |
Set the ratio that the margin takes with respect to the corresponding page length. The margin is excluded from the analysis when a page is blank, preventing border artifacts from affecting blank page detection. Allowed values range from 0.0
to 0.5
. This parameter is only active if the value of RecognizeBlankPages is set to true
.
Output format control
DisableMaskEmbedding
Key | Type | Default |
---|---|---|
DisableMaskEmbedding | Boolean | false |
If this option is set to true
, no mask is embedded in the output TIFF. When set to false
(default), bitonal masks are embedded in the output TIFF or PDF as an image layer. The mask layer is omitted when enabled, and only recognized text is preserved. The DisableMaskEmbedding
is useful for output without background images. The DisableMaskEmbedding
parameter takes Boolean values true
or false
.
Footnotes
-
All occurences of sentence “profile corresponds to the following parameters” reference a custom profile INI file with an equivalent configuration to a predefined profile. For more information, review
Profile
, and Custom Profiles. ↩