[PagePreprocessingParams] INI file section
The [PagePreprocessingParams]
INI file section defines configuration parameters for image preprocessing, including options for correcting image orientation, skew, shadows, geometry, and resolution before text recognition.
CorrectInvertedImage
Key | Type | Default |
---|---|---|
CorrectInvertedImage | Boolean | false |
When set to true
, the Pdftools OCR Service detects whether the image is inverted (white text against a black background). The text color is detected during page preprocessing, and if it differs from usual, the Pdftools OCR Service automatically inverts the image.
CorrectOrientation
Key | Type | Default |
---|---|---|
CorrectOrientation | Boolean | false |
If this property is true
, the page orientation is detected during page preprocessing, and if it differs from usual, the Pdftools OCR Service automatically rotates the image.
If this property is set to true
: The TextTypes
property of the RecognizerParams
cannot be set to TT_Handprinted
.
CorrectShadowsAndHighlights
Key | Type | Default |
---|---|---|
CorrectShadowsAndHighlights | ThreeStatePropertyValueEnum | TSPV_Auto |
If this property is set to TSPV_Yes
, the image preprocessing includes correction of excessive shadows and high-lighting to improve recognition quality.
This property is designed for use with photographs only.
ThreeStatePropertyValueEnum
TSPV_Auto
: Automatically determine if this processing mode should be used, depending on the situation (image characteristics, etc.).TSPV_No
: The processing mode in question will not be used.TSPV_Yes
: The processing mode in question will be used.
CorrectSkew
Key | Type | Default |
---|---|---|
CorrectSkew | ThreeStatePropertyValueEnum | TSPV_Auto |
If this property is set to TSPV_Yes
, the Pdftools OCR Service corrects image skew during page preprocessing.
The type of skew correction is defined by the CorrectSkewMode
property.
If this property is set to TSPV_No
, the value of the CorrectSkewMode
property is ignored.
CorrectSkewMode
Key | Type | Default |
---|---|---|
CorrectSkewMode | CorrectSkewModeEnum | CSM_CorrectSkewByHorizontalText | CSM_CorrectSkewByVerticalText |
Specifies the mode of skew correction. The value of this property is an OR superposition of the CorrectSkewModeEnum
enumeration constants that denote the types of skew correction. 0
means do not correct skew.
The value of this property is ignored if the CorrectSkew
property is set to TSPV_No
.
CorrectSkewModeEnum
CSM_CorrectSkewByBlackSquaresHorizontally
: The image skew angle is corrected based on so-called ‘black squares’ (the skew angle is calculated based on the horizontal pairs of squares). Black squares are often placed on forms. We recommend that you use this constant only when working with images of forms; otherwise, you may obtain incorrect results.CSM_CorrectSkewByBlackSquaresVertically
: The image skew angle is corrected based on so-called ‘black squares’ (the skew angle is calculated based on the vertical pairs of squares). Black squares are often placed on forms. We recommend that you use this constant only when working with images of forms; otherwise, you may obtain incorrect results.CSM_CorrectSkewByHorizontalLines
: The image skew angle is corrected based on horizontal lines. Use this constant only when working with images that contain horizontal lines (for example, invoices, price lists, or other documents that contain tables with visible borders); otherwise, you may get incorrect results.CSM_CorrectSkewByHorizontalText
: The image skew angle is corrected based on horizontal text lines.CSM_CorrectSkewByVerticalLines
: The image skew angle is corrected based on vertical lines. We recommend that you use this constant only when working with images that contain vertical lines (for example, invoices, price lists, or other documents that contain tables with visible borders); otherwise, you may obtain incorrect results.CSM_CorrectSkewByVerticalText
: The image skew angle is corrected based on vertical text lines. The constant may be useful when working with documents in languages such as Chinese, Japanese, or Korean, or if page orientation is incorrect.
GeometryCorrectionMode
Key | Type | Default |
---|---|---|
GeometryCorrectionMode | GeometryCorrectionModeEnum | GCM_Auto |
Specifies whether geometrical distortions (perspective on photos, curved lines from scanned books, etc.) should be removed during page preprocessing.
GeometryCorrectionModeEnum
GCM_Auto
: Image geometry correction will be performed, if necessary. Pdftools OCR Service determines automatically, whether the processed document is a photo and if it is, will perform geometry correction.GCM_Correct
: Always correct image geometry. Photographs usually have perspective distortions; use this when processing photos.GCM_DontCorrect
: Do not correct image geometry. Use when processing scanned images of good quality where correction is unnecessary.
ResolutionCorrectionMode
Key | Type | Default |
---|---|---|
ResolutionCorrectionMode | ResolutionCorrectionModeEnum | RCM_Auto |
Specifies whether resolution of the image should be corrected during page preprocessing.
ResolutionCorrectionModeEnum
RCM_Auto
: If the resolution of the image is suspicious, the Pdftools OCR Service automatically detects and adjusts it.RCM_Correct
: Detect and correct image resolution.RCM_DontCorrect
: Do not correct image resolution.