[ObjectsExtractionParams] INI file section
The [ObjectsExtractionParams] INI file section controls how Pdftools OCR Service extracts, filters, and detects visual objects and text elements from scanned images.
Common settings
FastObjectsExtraction
| Key | Type | Default |
|---|---|---|
FastObjectsExtraction | Boolean | false |
If this property is set to true, object extraction speeds up, but quality may deteriorate.
ProhibitColorImage
| Key | Type | Default |
|---|---|---|
ProhibitColorImage | Boolean | false |
If set to true, Pdftools OCR Service uses only a black-and-white plane during object extraction.
Detection quality for colored tables and images may be reduced.
SourceContentReuseMode
| Key | Type | Default |
|---|---|---|
SourceContentReuseMode | SourceContentReuseModeEnum | CRM_Auto |
Specifies how to use the text and image layers of the source PDF file.
SourceContentReuseModeEnum
CRM_Auto: Automatically selects how to reuse source content from PDF files. If the result doesn’t meet expectations, or you know the document type in advance, select the mode manually.CRM_ContentAndPictures: Automatically selects whether to use the source text or rasterized image for each part of a page. If the text from the source file is considered reliable, it’s used; otherwise, the text from the raster is used.CRM_ContentOnly: Uses both the text and image layers of the source PDF file directly.cautionUsing the text contents of the source file speeds up processing, but if you choose this mode and there’s no text layer, an error occurs. Use this mode for source files with visible text encoded in Unicode, ASCII, or another character encoding standard, with correct font and size settings. For other file types, use
CRM_Auto,CRM_ContentAndPictures, orCRM_DoNotReuse.CRM_DoNotReuse: Rasterizes the pages of the source PDF file and processes them. The contents of the source file are ignored.
Removing objects
RemoveGarbage
| Key | Type | Default |
|---|---|---|
RemoveGarbage | Boolean | false |
Specifies whether to remove “garbage” (for example, dots smaller than a certain size) from the image during object extraction.
RemoveTexture
| Key | Type | Default |
|---|---|---|
RemoveTexture | Boolean | true |
If set to true, Pdftools OCR Service removes background texture noise from a temporary image used for recognition.
The source image itself remains unchanged.
Detecting objects
DetectMatrixPrinter
| Key | Type | Default |
|---|---|---|
DetectMatrixPrinter | Boolean | true |
If this property is set to true, text printed using a matrix printer is detected during objects extraction.
DetectPorousText
| Key | Type | Default |
|---|---|---|
DetectPorousText | Boolean | true |
If set to true, regions with porous text are detected during objects extraction.
EnableAggressiveTextExtraction
| Key | Type | Default |
|---|---|---|
EnableAggressiveTextExtraction | Boolean | false |
If set to true, Pdftools OCR Service attempts to extract as much text as possible, even from low-quality images.
Recommended when the input contains degraded or faint text.
The EnableAggressiveTextExtraction mode may lead to misinterpreting pictures as text or vertically rearranging horizontal text.
ProhibitDottedSeparators
| Key | Type | Default |
|---|---|---|
ProhibitDottedSeparators | Boolean | false |
If this property is set to true, Pdftools OCR Service presumes that the document does not contain dotted separators.
This can be useful if you’re certain the document lacks dotted separators or if some content is mistakenly identified as one.