Skip to main content
Version: Version 1.0.0

[PagePreprocessingParams] INI file section

The [PagePreprocessingParams] INI file section defines configuration parameters for image preprocessing, including options for correcting image orientation, skew, shadows, geometry, and resolution before text recognition.


CorrectInvertedImage

KeyTypeDefault
CorrectInvertedImageBooleanfalse

When set to true, the Pdftools OCR Service detects whether the image is inverted (white text against a black background). The text color is detected during page preprocessing, and if it differs from usual, the Pdftools OCR Service automatically inverts the image.


CorrectOrientation

KeyTypeDefault
CorrectOrientationBooleanfalse

If this property is true, the page orientation is detected during page preprocessing, and if it differs from usual, the Pdftools OCR Service automatically rotates the image.

note

If this property is set to true: The TextTypes property of the RecognizerParams cannot be set to TT_Handprinted.


CorrectShadowsAndHighlights

KeyTypeDefault
CorrectShadowsAndHighlightsThreeStatePropertyValueEnumTSPV_Auto

If this property is set to TSPV_Yes, the image preprocessing includes correction of excessive shadows and high-lighting to improve recognition quality.
This property is designed for use with photographs only.

ThreeStatePropertyValueEnum

  • TSPV_Auto: Automatically determine if this processing mode should be used, depending on the situation (image characteristics, etc.).
  • TSPV_No: The processing mode in question will not be used.
  • TSPV_Yes: The processing mode in question will be used.

CorrectSkew

KeyTypeDefault
CorrectSkewThreeStatePropertyValueEnumTSPV_Auto

If this property is set to TSPV_Yes, the Pdftools OCR Service corrects image skew during page preprocessing.
The type of skew correction is defined by the CorrectSkewMode property.

If this property is set to TSPV_No, the value of the CorrectSkewMode property is ignored.


CorrectSkewMode

KeyTypeDefault
CorrectSkewModeCorrectSkewModeEnumCSM_CorrectSkewByHorizontalText | CSM_CorrectSkewByVerticalText

Specifies the mode of skew correction. The value of this property is an OR superposition of the CorrectSkewModeEnum enumeration constants that denote the types of skew correction. 0 means do not correct skew.

The value of this property is ignored if the CorrectSkew property is set to TSPV_No.

CorrectSkewModeEnum

  • CSM_CorrectSkewByBlackSquaresHorizontally: The image skew angle is corrected based on so-called ‘black squares’ (the skew angle is calculated based on the horizontal pairs of squares). Black squares are often placed on forms. We recommend that you use this constant only when working with images of forms; otherwise, you may obtain incorrect results.
  • CSM_CorrectSkewByBlackSquaresVertically: The image skew angle is corrected based on so-called ‘black squares’ (the skew angle is calculated based on the vertical pairs of squares). Black squares are often placed on forms. We recommend that you use this constant only when working with images of forms; otherwise, you may obtain incorrect results.
  • CSM_CorrectSkewByHorizontalLines: The image skew angle is corrected based on horizontal lines. Use this constant only when working with images that contain horizontal lines (for example, invoices, price lists, or other documents that contain tables with visible borders); otherwise, you may get incorrect results.
  • CSM_CorrectSkewByHorizontalText: The image skew angle is corrected based on horizontal text lines.
  • CSM_CorrectSkewByVerticalLines: The image skew angle is corrected based on vertical lines. We recommend that you use this constant only when working with images that contain vertical lines (for example, invoices, price lists, or other documents that contain tables with visible borders); otherwise, you may obtain incorrect results.
  • CSM_CorrectSkewByVerticalText: The image skew angle is corrected based on vertical text lines. The constant may be useful when working with documents in languages such as Chinese, Japanese, or Korean, or if page orientation is incorrect.

GeometryCorrectionMode

KeyTypeDefault
GeometryCorrectionModeGeometryCorrectionModeEnumGCM_Auto

Specifies whether geometrical distortions (perspective on photos, curved lines from scanned books, etc.) should be removed during page preprocessing.

GeometryCorrectionModeEnum

  • GCM_Auto: Image geometry correction will be performed, if necessary. Pdftools OCR Service determines automatically, whether the processed document is a photo and if it is, will perform geometry correction.
  • GCM_Correct: Always correct image geometry. Photographs usually have perspective distortions; use this when processing photos.
  • GCM_DontCorrect: Do not correct image geometry. Use when processing scanned images of good quality where correction is unnecessary.

ResolutionCorrectionMode

KeyTypeDefault
ResolutionCorrectionModeResolutionCorrectionModeEnumRCM_Auto

Specifies whether resolution of the image should be corrected during page preprocessing.

ResolutionCorrectionModeEnum

  • RCM_Auto: If the resolution of the image is suspicious, the Pdftools OCR Service automatically detects and adjusts it.
  • RCM_Correct: Detect and correct image resolution.
  • RCM_DontCorrect: Do not correct image resolution.