Version: Version 1.1.0

[PageAnalysisParams] INI file section

The [PageAnalysisParams] INI file section defines parameters controlling how the Pdftools OCR Service analyzes page content during layout analysis, including detecting text, tables, images, barcodes, and layout structures.

Analysis mode

`AnalysisMode`

Key	Type	Default
`AnalysisMode`	`PageAnalysisModeEnum`	`PAM_DocumentConversion`

Specifies the layout analysis mode. For documents with tables, complex layouts, or many different object types, set this property to PAM_TextExtraction.

`PageAnalysisModeEnum`

PAM_DocumentConversion: Optimized for documents that primarily contain text.
PAM_TextExtraction: Optimized for documents with mixed layouts, tables, and other object types that the export needs to preserve (for example, invoices).

`SpeedQualityMode`

Key	Type	Default
`SpeedQualityMode`	`SpeedQualityModeEnum`	`SQM_Fast`

Manages the balance between analysis accuracy and speed. You may need the SQM_Accurate setting when processing documents with complex backgrounds or watermarks.

`SpeedQualityModeEnum`

SQM_Fast: Prioritizes speed over accuracy. Suitable for documents without complex color patterns or watermarks.
SQM_Accurate: Prioritizes accuracy over speed. Use this mode for documents with complex or colorful backgrounds or watermarks.

Block detection

During layout analysis, Pdftools OCR Service segments each page into rectangular regions called blocks. Each block represents a distinct content area: a paragraph of text, a table, an image, or a barcode. The parameters in this section control which content types the engine detects and assigns to dedicated blocks. Content that doesn’t match a specific type may still appear in the layout as a generic type such as a picture.

After detection, Pdftools OCR Service sorts blocks into reading order based on their position on the page. You can fine-tune how blocks group together during this ordering step with the [SortingBlocksParams] section, and adjust how the engine processes each block’s image (rotation, mirroring, inversion) with the [ImageProcessingParams] section.

`DetectText`

Key	Type	Default
`DetectText`	Boolean	`true`

If this property is true, text areas are detected during layout analysis.

`DetectHandwritten`

Key	Type	Default
`DetectHandwritten`	Boolean	`false`

When set to true, enables detection of handwritten text. Handwritten text can only be detected if the SpeedQualityMode property is set to SQM_Accurate; otherwise, this setting is ignored.

`DetectTables`

Key	Type	Default
`DetectTables`	Boolean	`true`

If this property is true, tables are detected during layout analysis. Table detection parameters are configured in the [TableAnalysisParams] section.

`DetectBarcodes`

Key	Type	Default
`DetectBarcodes`	Boolean	`false`

Specifies if barcodes are detected and barcode blocks are created during layout analysis. If this property is false, barcodes may be detected as blocks of some other type (for example, pictures). Barcode recognition parameters are configured in the [BarcodeParams] section.

`DetectCheckmarks`

Key	Type	Default
`DetectCheckmarks`	Boolean	`false`

If this property is true, checkmarks are detected during layout analysis.

`DetectSeparators`

Key	Type	Default
`DetectSeparators`	Boolean	`true`

If this property is true, separators are detected during layout analysis.

`DetectPictures`

Key	Type	Default
`DetectPictures`	Boolean	`true`

If this property is true, pictures are detected during layout analysis.

`DetectVectorGraphics`

Key	Type	Default
`DetectVectorGraphics`	Boolean	`true`

If this property is true, vector pictures are detected during layout analysis. Vector picture blocks may appear in the layout only if this property was set to true during layout analysis.

`DetectStamps`

Key	Type	Default
`DetectStamps`	Boolean	`false`

If this property is true, stamps are detected during layout analysis and placed into picture blocks. The text on detected stamps isn’t recognized. Stamps can only be detected if the SpeedQualityMode property is set to SQM_Accurate; otherwise, this setting is ignored.

Additional settings

`NoShadowsMode`

Key	Type	Default
`NoShadowsMode`	Boolean	`false`

When set to true, the Pdftools OCR Service presumes that an image has no shadows from scanning.

`PaperSizeDetectionMode`

Key	Type	Default
`PaperSizeDetectionMode`	`PaperSizeDetectionModeEnum`	`PSDM_Auto`

Indicates if the whole preprocessed image can contain information for analysis. When set to PSDM_CloseToImageSize, the area for analysis is defined close to the original image size.

note

For correct operation of this property, the NoShadowsMode property must be set to false.

`PaperSizeDetectionModeEnum`

PSDM_Auto: Determines the analysis area automatically. The area may be much smaller than the original image.
PSDM_Unknown: No predefined information about the image area. The analysis area may be much smaller than the original image.
PSDM_CloseToImageSize: The whole image can contain information for analysis. The analysis area stays close to the original image size.

`DetectTextOnPictures`

Key	Type	Default
`DetectTextOnPictures`	Boolean	`false`

When this property is set to true, the Pdftools OCR Service detects all text on a page image, including text embedded in pictures. The reading order isn’t changed, enabling full-text search later.

`DetectVerticalEuropeanText`

Key	Type	Default
`DetectVerticalEuropeanText`	Boolean	`false`

When set to true, the Pdftools OCR Service looks for vertically oriented text. It applies to all languages other than CJK. For CJK languages, vertical text detection is managed by the ProhibitCJKColumns property.

`ProhibitCJKColumns`

Key	Type	Default
`ProhibitCJKColumns`	Boolean	`false`

The text in CJK languages can be written vertically as well as horizontally. Setting this property to true sets the Pdftools OCR Service to ignore the possibility of vertical text and recognize the image with the assumption that all text is arranged horizontally.

This property is valid only for CJK languages.

`ProhibitDoublePageMode`

Key	Type	Default
`ProhibitDoublePageMode`	Boolean	`false`

When set to true, the Pdftools OCR Service presumes that an image isn’t a double-page book.

`CollectPdfExportData`

Key	Type	Default
`CollectPdfExportData`	Boolean	`false`

When set to true, the Pdftools OCR Service collects data for PDF export during layout analysis. The export process uses this data for image-only PDF with MRC compression.

note

With this property set to true, recognition isn’t supported. Recognized text isn’t needed to export the document to image-only PDF.

Analysis mode​

AnalysisMode​

PageAnalysisModeEnum​

SpeedQualityMode​

SpeedQualityModeEnum​

Block detection​

DetectText​

DetectHandwritten​

DetectTables​

DetectBarcodes​

DetectCheckmarks​

DetectSeparators​

DetectPictures​

DetectVectorGraphics​

DetectStamps​

Additional settings​

NoShadowsMode​

PaperSizeDetectionMode​

PaperSizeDetectionModeEnum​

DetectTextOnPictures​

DetectVerticalEuropeanText​

ProhibitCJKColumns​

ProhibitDoublePageMode​

CollectPdfExportData​