[PageAnalysisParams] INI file section
The [PageAnalysisParams] INI file section defines parameters controlling how the Pdftools OCR Service analyzes page content during layout analysis, including detecting text, tables, images, barcodes, and layout structures.
Analysis mode
AnalysisMode
| Key | Type | Default |
|---|---|---|
AnalysisMode | PageAnalysisModeEnum | PAM_DocumentConversion |
Specifies the layout analysis mode. For documents with tables, complex layouts, or many different object types, set this property to PAM_TextExtraction.
PageAnalysisModeEnum
PAM_DocumentConversion: Optimized for documents that primarily contain text.PAM_TextExtraction: Optimized for documents with mixed layouts, tables, and other object types that the export needs to preserve (for example, invoices).
SpeedQualityMode
| Key | Type | Default |
|---|---|---|
SpeedQualityMode | SpeedQualityModeEnum | SQM_Fast |
Manages the balance between analysis accuracy and speed. You may need the SQM_Accurate setting when processing documents with complex backgrounds or watermarks.
SpeedQualityModeEnum
SQM_Fast: Prioritizes speed over accuracy. Suitable for documents without complex color patterns or watermarks.SQM_Accurate: Prioritizes accuracy over speed. Use this mode for documents with complex or colorful backgrounds or watermarks.
Block detection
During layout analysis, Pdftools OCR Service segments each page into rectangular regions called blocks. Each block represents a distinct content area: a paragraph of text, a table, an image, or a barcode. The parameters in this section control which content types the engine detects and assigns to dedicated blocks. Content that doesn’t match a specific type may still appear in the layout as a generic type such as a picture.
After detection, Pdftools OCR Service sorts blocks into reading order based on their position on the page. You can fine-tune how blocks group together during this ordering step with the [SortingBlocksParams] section, and adjust how the engine processes each block’s image (rotation, mirroring, inversion) with the [ImageProcessingParams] section.
DetectText
| Key | Type | Default |
|---|---|---|
DetectText | Boolean | true |
If this property is true, text areas are detected during layout analysis.
DetectHandwritten
| Key | Type | Default |
|---|---|---|
DetectHandwritten | Boolean | false |
When set to true, enables detection of handwritten text. Handwritten text can only be detected if the SpeedQualityMode property is set to SQM_Accurate; otherwise, this setting is ignored.
DetectTables
| Key | Type | Default |
|---|---|---|
DetectTables | Boolean | true |
If this property is true, tables are detected during layout analysis. Table detection parameters are configured in the [TableAnalysisParams] section.
DetectBarcodes
| Key | Type | Default |
|---|---|---|
DetectBarcodes | Boolean | false |
Specifies if barcodes are detected and barcode blocks are created during layout analysis. If this property is false, barcodes may be detected as blocks of some other type (for example, pictures). Barcode recognition parameters are configured in the [BarcodeParams] section.
DetectCheckmarks
| Key | Type | Default |
|---|---|---|
DetectCheckmarks | Boolean | false |
If this property is true, checkmarks are detected during layout analysis.
DetectSeparators
| Key | Type | Default |
|---|---|---|
DetectSeparators | Boolean | true |
If this property is true, separators are detected during layout analysis.
DetectPictures
| Key | Type | Default |
|---|---|---|
DetectPictures | Boolean | true |
If this property is true, pictures are detected during layout analysis.
DetectVectorGraphics
| Key | Type | Default |
|---|---|---|
DetectVectorGraphics | Boolean | true |
If this property is true, vector pictures are detected during layout analysis.
Vector picture blocks may appear in the layout only if this property was set to true during layout analysis.
DetectStamps
| Key | Type | Default |
|---|---|---|
DetectStamps | Boolean | false |
If this property is true, stamps are detected during layout analysis and placed into picture blocks. The text on detected stamps isn’t recognized. Stamps can only be detected if the SpeedQualityMode property is set to SQM_Accurate; otherwise, this setting is ignored.
Additional settings
NoShadowsMode
| Key | Type | Default |
|---|---|---|
NoShadowsMode | Boolean | false |
When set to true, the Pdftools OCR Service presumes that an image has no shadows from scanning.
PaperSizeDetectionMode
| Key | Type | Default |
|---|---|---|
PaperSizeDetectionMode | PaperSizeDetectionModeEnum | PSDM_Auto |
Indicates if the whole preprocessed image can contain information for analysis. When set to PSDM_CloseToImageSize, the area for analysis is defined close to the original image size.
For correct operation of this property, the NoShadowsMode property must be set to false.
PaperSizeDetectionModeEnum
PSDM_Auto: Determines the analysis area automatically. The area may be much smaller than the original image.PSDM_Unknown: No predefined information about the image area. The analysis area may be much smaller than the original image.PSDM_CloseToImageSize: The whole image can contain information for analysis. The analysis area stays close to the original image size.
DetectTextOnPictures
| Key | Type | Default |
|---|---|---|
DetectTextOnPictures | Boolean | false |
When this property is set to true, the Pdftools OCR Service detects all text on a page image, including text embedded in pictures. The reading order isn’t changed, enabling full-text search later.
DetectVerticalEuropeanText
| Key | Type | Default |
|---|---|---|
DetectVerticalEuropeanText | Boolean | false |
When set to true, the Pdftools OCR Service looks for vertically oriented text.
It applies to all languages other than CJK.
For CJK languages, vertical text detection is managed by the ProhibitCJKColumns property.
ProhibitCJKColumns
| Key | Type | Default |
|---|---|---|
ProhibitCJKColumns | Boolean | false |
The text in CJK languages can be written vertically as well as horizontally.
Setting this property to true sets the Pdftools OCR Service to ignore the possibility of vertical text and recognize the image with the assumption that all text is arranged horizontally.
This property is valid only for CJK languages.
ProhibitDoublePageMode
| Key | Type | Default |
|---|---|---|
ProhibitDoublePageMode | Boolean | false |
When set to true, the Pdftools OCR Service presumes that an image isn’t a double-page book.
CollectPdfExportData
| Key | Type | Default |
|---|---|---|
CollectPdfExportData | Boolean | false |
When set to true, the Pdftools OCR Service collects data for PDF export during layout analysis. The export process uses this data for image-only PDF with MRC compression.
With this property set to true, recognition isn’t supported. Recognized text isn’t needed to export the document to image-only PDF.