Skip to main content
Version: Version 1.1.0

[PagePreprocessingParams] INI file section

The [PagePreprocessingParams] INI file section defines configuration parameters for image preprocessing, including options for correcting image orientation, skew, geometry, resolution, binarization, and background whitening before text recognition.


Orientation and skew correction

CorrectOrientationMode

KeyTypeDefault
CorrectOrientationModeCorrectOrientationModeEnumCOM_Auto

Specifies how image orientation should be corrected during preprocessing. In the default mode, orientation is determined and corrected automatically if needed.

CorrectOrientationModeEnum

  • COM_Auto: Determines and corrects orientation automatically if needed.
  • COM_Clockwise: Rotates the image 90 degrees clockwise.
  • COM_CounterClockwise: Rotates the image 90 degrees counterclockwise.
  • COM_UpsideDown: Rotates the image 180 degrees.
  • COM_No: Don’t correct orientation.

CorrectSkewMode

KeyTypeDefault
CorrectSkewModeCorrectSkewModeEnumCSM_Auto

Specifies whether and how image skew should be corrected during page preprocessing.

note

Skew can be corrected only for angles not greater than 20 degrees.

CorrectSkewModeEnum

  • CSM_Auto: Uses neural-network-based skew correction. More accurate on average but slower than CSM_Fast.
  • CSM_Fast: Uses a non-neural-network skew correction algorithm. Faster but less precise on average.
  • CSM_Off: Disables skew correction.

StraightenLinesMode

KeyTypeDefault
StraightenLinesModeStraightenLinesModeEnumSLM_Auto

Specifies how lines will be straightened during page preprocessing. In the default mode, a neural network algorithm is used.

StraightenLinesModeEnum

  • SLM_Auto: Uses a neural-network-based method. Better average quality but slower than SLM_Fast.
  • SLM_Fast: Uses a non-neural-network method. Faster but lower average quality, though it may perform better on some pages.

Image correction

CorrectInvertedImage

KeyTypeDefault
CorrectInvertedImageThreeStatePropertyValueEnumTSPV_Auto

Specifies if inverted images (white text on a black background) should be corrected. In the default mode, the Pdftools OCR Service corrects inverted images automatically.

ThreeStatePropertyValueEnum

  • TSPV_Auto: Automatically determine if this processing mode should be used, depending on the situation (image characteristics, etc.).
  • TSPV_No: The processing mode in question will not be used.
  • TSPV_Yes: The processing mode in question will be used.

CorrectGeometry

KeyTypeDefault
CorrectGeometryThreeStatePropertyValueEnumTSPV_Auto

Specifies whether geometrical distortions (perspective on photos, curved lines from scanned books, and similar effects) should be removed during page preprocessing. In the default mode, the Pdftools OCR Service corrects geometry for photographs.


BackgroundWhitening

KeyTypeDefault
BackgroundWhiteningThreeStatePropertyValueEnumTSPV_Auto

Specifies if the image background should be whitened. In the default mode, the Pdftools OCR Service whitens the background automatically.


CropImage

KeyTypeDefault
CropImageThreeStatePropertyValueEnumTSPV_Auto

If this property is set to TSPV_Yes, during preprocessing the Pdftools OCR Service detects document edges on the image and crops the image accordingly. In the default mode, the Pdftools OCR Service crops the image or skips this step automatically, depending on the source of the processed image.

note

This feature is not supported for black-and-white images.


Image type and resolution

DetectImageType

KeyTypeDefault
DetectImageTypeThreeStatePropertyValueEnumTSPV_Auto

Specifies how the image type is determined. This works in conjunction with the ImageSourceType property of the [PrepareImageMode] section.

  • If ImageSourceType is IST_Auto and this property is TSPV_Auto or TSPV_Yes, the Pdftools OCR Service detects the image type automatically.
  • If ImageSourceType is IST_Auto and this property is TSPV_No, the image type is read from the file properties or metadata (faster, but depends on correct metadata).
  • If ImageSourceType is set to a specific value (for example IST_Photo or IST_Scan), detection is not performed regardless of this setting.

OverwriteResolutionMode

KeyTypeDefault
OverwriteResolutionModeOverwriteResolutionModeEnumORM_Auto

Specifies whether the resolution of the image should be overwritten during page preprocessing. When set to ORM_Manual, use the ResolutionToOverwrite property to specify the new image resolution. The new resolution is applied before all other stages of image preparation (binarization, skew correction, and so on).

info
  • If you set this property to ORM_No and the resolution of the prepared image is too low (less than 50 DPI), too high (more than 3200 DPI), or undefined, an error will occur.
  • For PDF files, the resolution is used for image rasterization. The image size in pixels may change based on the detected resolution and page dimensions.

OverwriteResolutionModeEnum

  • ORM_Auto: Automatically detects and overwrites the image resolution if needed.
  • ORM_Manual: Uses the resolution specified in the ResolutionToOverwrite property.
  • ORM_No: Don’t overwrite the image resolution.

ResolutionToOverwrite

KeyTypeDefault
ResolutionToOverwriteInteger0

Specifies the resolution value in DPI to overwrite the image resolution. This property is only used when OverwriteResolutionMode is set to ORM_Manual.

warning

The default value is 0. You must set the desired resolution value when using ORM_Manual; otherwise, an error will occur.


Binarization and color

DiscardColorImage

KeyTypeDefault
DiscardColorImageBooleanfalse

When set to true, the Pdftools OCR Service leaves only the black-and-white plane in the prepared image. In this case, image binarization is performed during image preprocessing.


UseFastBinarization

KeyTypeDefault
UseFastBinarizationBooleanfalse

If this property is set to true, the Pdftools OCR Service uses algorithms for fast image binarization. This speeds up binarization, but quality may deteriorate.

Binarization is performed either during preprocessing (if DiscardColorImage is true), or later when a black-and-white image is necessary.


Page splitting

SplitType

KeyTypeDefault
SplitTypePageSplitTypeEnumPST_None

Specifies the parameters of page splitting.

PageSplitTypeEnum

  • PST_None: Don’t split the page.
  • PST_DoublePageSplit: Splits a double-page spread into two separate pages.
  • PST_BusinessCardSplit: Splits a page containing multiple business cards into individual cards.