Skip to main content
Version: Version 1.1.0

[PageAnalysisParams] INI file section

The [PageAnalysisParams] INI file section defines parameters controlling how the Pdftools OCR Service analyzes page content during layout analysis, including detecting text, tables, images, barcodes, and layout structures.


Analysis mode

AnalysisMode

KeyTypeDefault
AnalysisModePageAnalysisModeEnumPAM_DocumentConversion

Specifies the layout analysis mode. For documents with tables, complex layouts, or many different object types, set this property to PAM_TextExtraction.

PageAnalysisModeEnum

  • PAM_DocumentConversion: Optimized for documents that primarily contain text.
  • PAM_TextExtraction: Optimized for documents with mixed layouts, tables, and other object types that the export needs to preserve (for example, invoices).

SpeedQualityMode

KeyTypeDefault
SpeedQualityModeSpeedQualityModeEnumSQM_Fast

Manages the balance between analysis accuracy and speed. You may need the SQM_Accurate setting when processing documents with complex backgrounds or watermarks.

SpeedQualityModeEnum

  • SQM_Fast: Prioritizes speed over accuracy. Suitable for documents without complex color patterns or watermarks.
  • SQM_Accurate: Prioritizes accuracy over speed. Use this mode for documents with complex or colorful backgrounds or watermarks.

Block detection

During layout analysis, Pdftools OCR Service segments each page into rectangular regions called blocks. Each block represents a distinct content area: a paragraph of text, a table, an image, or a barcode. The parameters in this section control which content types the engine detects and assigns to dedicated blocks. Content that doesn’t match a specific type may still appear in the layout as a generic type such as a picture.

After detection, Pdftools OCR Service sorts blocks into reading order based on their position on the page. You can fine-tune how blocks group together during this ordering step with the [SortingBlocksParams] section, and adjust how the engine processes each block’s image (rotation, mirroring, inversion) with the [ImageProcessingParams] section.

DetectText

KeyTypeDefault
DetectTextBooleantrue

If this property is true, text areas are detected during layout analysis.


DetectHandwritten

KeyTypeDefault
DetectHandwrittenBooleanfalse

When set to true, enables detection of handwritten text. Handwritten text can only be detected if the SpeedQualityMode property is set to SQM_Accurate; otherwise, this setting is ignored.


DetectTables

KeyTypeDefault
DetectTablesBooleantrue

If this property is true, tables are detected during layout analysis. Table detection parameters are configured in the [TableAnalysisParams] section.


DetectBarcodes

KeyTypeDefault
DetectBarcodesBooleanfalse

Specifies if barcodes are detected and barcode blocks are created during layout analysis. If this property is false, barcodes may be detected as blocks of some other type (for example, pictures). Barcode recognition parameters are configured in the [BarcodeParams] section.


DetectCheckmarks

KeyTypeDefault
DetectCheckmarksBooleanfalse

If this property is true, checkmarks are detected during layout analysis.


DetectSeparators

KeyTypeDefault
DetectSeparatorsBooleantrue

If this property is true, separators are detected during layout analysis.


DetectPictures

KeyTypeDefault
DetectPicturesBooleantrue

If this property is true, pictures are detected during layout analysis.


DetectVectorGraphics

KeyTypeDefault
DetectVectorGraphicsBooleantrue

If this property is true, vector pictures are detected during layout analysis. Vector picture blocks may appear in the layout only if this property was set to true during layout analysis.


DetectStamps

KeyTypeDefault
DetectStampsBooleanfalse

If this property is true, stamps are detected during layout analysis and placed into picture blocks. The text on detected stamps isn’t recognized. Stamps can only be detected if the SpeedQualityMode property is set to SQM_Accurate; otherwise, this setting is ignored.


Additional settings

NoShadowsMode

KeyTypeDefault
NoShadowsModeBooleanfalse

When set to true, the Pdftools OCR Service presumes that an image has no shadows from scanning.


PaperSizeDetectionMode

KeyTypeDefault
PaperSizeDetectionModePaperSizeDetectionModeEnumPSDM_Auto

Indicates if the whole preprocessed image can contain information for analysis. When set to PSDM_CloseToImageSize, the area for analysis is defined close to the original image size.

note

For correct operation of this property, the NoShadowsMode property must be set to false.

PaperSizeDetectionModeEnum

  • PSDM_Auto: Determines the analysis area automatically. The area may be much smaller than the original image.
  • PSDM_Unknown: No predefined information about the image area. The analysis area may be much smaller than the original image.
  • PSDM_CloseToImageSize: The whole image can contain information for analysis. The analysis area stays close to the original image size.

DetectTextOnPictures

KeyTypeDefault
DetectTextOnPicturesBooleanfalse

When this property is set to true, the Pdftools OCR Service detects all text on a page image, including text embedded in pictures. The reading order isn’t changed, enabling full-text search later.


DetectVerticalEuropeanText

KeyTypeDefault
DetectVerticalEuropeanTextBooleanfalse

When set to true, the Pdftools OCR Service looks for vertically oriented text. It applies to all languages other than CJK. For CJK languages, vertical text detection is managed by the ProhibitCJKColumns property.


ProhibitCJKColumns

KeyTypeDefault
ProhibitCJKColumnsBooleanfalse

The text in CJK languages can be written vertically as well as horizontally. Setting this property to true sets the Pdftools OCR Service to ignore the possibility of vertical text and recognize the image with the assumption that all text is arranged horizontally.

This property is valid only for CJK languages.


ProhibitDoublePageMode

KeyTypeDefault
ProhibitDoublePageModeBooleanfalse

When set to true, the Pdftools OCR Service presumes that an image isn’t a double-page book.


CollectPdfExportData

KeyTypeDefault
CollectPdfExportDataBooleanfalse

When set to true, the Pdftools OCR Service collects data for PDF export during layout analysis. The export process uses this data for image-only PDF with MRC compression.

note

With this property set to true, recognition isn’t supported. Recognized text isn’t needed to export the document to image-only PDF.