Version: Version 1.1.0

[ObjectsExtractionParams] INI file section

The [ObjectsExtractionParams] INI file section controls how Pdftools OCR Service extracts, filters, and detects visual objects and text elements from scanned images.

Common settings

`FastObjectsExtraction`

Key	Type	Default
`FastObjectsExtraction`	Boolean	`false`

If this property is set to true, object extraction speeds up, but quality may deteriorate.

`ProhibitColorImage`

Key	Type	Default
`ProhibitColorImage`	Boolean	`false`

If set to true, Pdftools OCR Service uses only a black-and-white plane during object extraction. Detection quality for colored tables and images may be reduced.

`SourceContentReuseMode`

Key	Type	Default
`SourceContentReuseMode`	`SourceContentReuseModeEnum`	`CRM_Auto`

Specifies how to use the text and image layers of the source PDF file.

`SourceContentReuseModeEnum`

CRM_Auto: Automatically selects how to reuse source content from PDF files. If the result doesn’t meet expectations, or you know the document type in advance, select the mode manually.
CRM_ContentAndPictures: Automatically selects whether to use the source text or rasterized image for each part of a page. If the text from the source file is considered reliable, it’s used; otherwise, the text from the raster is used.
CRM_ContentOnly: Uses both the text and image layers of the source PDF file directly.
caution
Using the text contents of the source file speeds up processing, but if you choose this mode and there’s no text layer, an error occurs. Use this mode for source files with visible text encoded in Unicode, ASCII, or another character encoding standard, with correct font and size settings. For other file types, use CRM_Auto, CRM_ContentAndPictures, or CRM_DoNotReuse.
CRM_DoNotReuse: Rasterizes the pages of the source PDF file and processes them. The contents of the source file are ignored.

Removing objects

`RemoveGarbage`

Key	Type	Default
`RemoveGarbage`	Boolean	`false`

Specifies whether to remove “garbage” (for example, dots smaller than a certain size) from the image during object extraction.

`RemoveTexture`

Key	Type	Default
`RemoveTexture`	Boolean	`true`

If set to true, Pdftools OCR Service removes background texture noise from a temporary image used for recognition. The source image itself remains unchanged.

Detecting objects

`DetectMatrixPrinter`

Key	Type	Default
`DetectMatrixPrinter`	Boolean	`true`

If this property is set to true, text printed using a matrix printer is detected during objects extraction.

`DetectPorousText`

Key	Type	Default
`DetectPorousText`	Boolean	`true`

If set to true, regions with porous text are detected during objects extraction.

`EnableAggressiveTextExtraction`

Key	Type	Default
`EnableAggressiveTextExtraction`	Boolean	`false`

If set to true, Pdftools OCR Service attempts to extract as much text as possible, even from low-quality images. Recommended when the input contains degraded or faint text.

warning

The EnableAggressiveTextExtraction mode may lead to misinterpreting pictures as text or vertically rearranging horizontal text.

`ProhibitDottedSeparators`

Key	Type	Default
`ProhibitDottedSeparators`	Boolean	`false`

If this property is set to true, Pdftools OCR Service presumes that the document does not contain dotted separators. This can be useful if you’re certain the document lacks dotted separators or if some content is mistakenly identified as one.

Common settings​

FastObjectsExtraction​

ProhibitColorImage​

SourceContentReuseMode​

SourceContentReuseModeEnum​

Removing objects​

RemoveGarbage​

RemoveTexture​

Detecting objects​

DetectMatrixPrinter​

DetectPorousText​

EnableAggressiveTextExtraction​

ProhibitDottedSeparators​