[PagePreprocessingParams] INI file section
The [PagePreprocessingParams] INI file section defines configuration parameters for image preprocessing, including options for correcting image orientation, skew, geometry, resolution, binarization, and background whitening before text recognition.
Orientation and skew correction
CorrectOrientationMode
| Key | Type | Default |
|---|---|---|
CorrectOrientationMode | CorrectOrientationModeEnum | COM_Auto |
Specifies how image orientation should be corrected during preprocessing. In the default mode, orientation is determined and corrected automatically if needed.
CorrectOrientationModeEnum
COM_Auto: Determines and corrects orientation automatically if needed.COM_Clockwise: Rotates the image 90 degrees clockwise.COM_CounterClockwise: Rotates the image 90 degrees counterclockwise.COM_UpsideDown: Rotates the image 180 degrees.COM_No: Don’t correct orientation.
CorrectSkewMode
| Key | Type | Default |
|---|---|---|
CorrectSkewMode | CorrectSkewModeEnum | CSM_Auto |
Specifies whether and how image skew should be corrected during page preprocessing.
Skew can be corrected only for angles not greater than 20 degrees.
CorrectSkewModeEnum
CSM_Auto: Uses neural-network-based skew correction. More accurate on average but slower thanCSM_Fast.CSM_Fast: Uses a non-neural-network skew correction algorithm. Faster but less precise on average.CSM_Off: Disables skew correction.
StraightenLinesMode
| Key | Type | Default |
|---|---|---|
StraightenLinesMode | StraightenLinesModeEnum | SLM_Auto |
Specifies how lines will be straightened during page preprocessing. In the default mode, a neural network algorithm is used.
StraightenLinesModeEnum
SLM_Auto: Uses a neural-network-based method. Better average quality but slower thanSLM_Fast.SLM_Fast: Uses a non-neural-network method. Faster but lower average quality, though it may perform better on some pages.
Image correction
CorrectInvertedImage
| Key | Type | Default |
|---|---|---|
CorrectInvertedImage | ThreeStatePropertyValueEnum | TSPV_Auto |
Specifies if inverted images (white text on a black background) should be corrected. In the default mode, the Pdftools OCR Service corrects inverted images automatically.
ThreeStatePropertyValueEnum
TSPV_Auto: Automatically determine if this processing mode should be used, depending on the situation (image characteristics, etc.).TSPV_No: The processing mode in question will not be used.TSPV_Yes: The processing mode in question will be used.
CorrectGeometry
| Key | Type | Default |
|---|---|---|
CorrectGeometry | ThreeStatePropertyValueEnum | TSPV_Auto |
Specifies whether geometrical distortions (perspective on photos, curved lines from scanned books, and similar effects) should be removed during page preprocessing. In the default mode, the Pdftools OCR Service corrects geometry for photographs.
BackgroundWhitening
| Key | Type | Default |
|---|---|---|
BackgroundWhitening | ThreeStatePropertyValueEnum | TSPV_Auto |
Specifies if the image background should be whitened. In the default mode, the Pdftools OCR Service whitens the background automatically.
CropImage
| Key | Type | Default |
|---|---|---|
CropImage | ThreeStatePropertyValueEnum | TSPV_Auto |
If this property is set to TSPV_Yes, during preprocessing the Pdftools OCR Service detects document edges on the image and crops the image accordingly. In the default mode, the Pdftools OCR Service crops the image or skips this step automatically, depending on the source of the processed image.
This feature is not supported for black-and-white images.
Image type and resolution
DetectImageType
| Key | Type | Default |
|---|---|---|
DetectImageType | ThreeStatePropertyValueEnum | TSPV_Auto |
Specifies how the image type is determined. This works in conjunction with the ImageSourceType property of the [PrepareImageMode] section.
- If
ImageSourceTypeisIST_Autoand this property isTSPV_AutoorTSPV_Yes, the Pdftools OCR Service detects the image type automatically. - If
ImageSourceTypeisIST_Autoand this property isTSPV_No, the image type is read from the file properties or metadata (faster, but depends on correct metadata). - If
ImageSourceTypeis set to a specific value (for exampleIST_PhotoorIST_Scan), detection is not performed regardless of this setting.
OverwriteResolutionMode
| Key | Type | Default |
|---|---|---|
OverwriteResolutionMode | OverwriteResolutionModeEnum | ORM_Auto |
Specifies whether the resolution of the image should be overwritten during page preprocessing. When set to ORM_Manual, use the ResolutionToOverwrite property to specify the new image resolution. The new resolution is applied before all other stages of image preparation (binarization, skew correction, and so on).
- If you set this property to
ORM_Noand the resolution of the prepared image is too low (less than 50 DPI), too high (more than 3200 DPI), or undefined, an error will occur. - For PDF files, the resolution is used for image rasterization. The image size in pixels may change based on the detected resolution and page dimensions.
OverwriteResolutionModeEnum
ORM_Auto: Automatically detects and overwrites the image resolution if needed.ORM_Manual: Uses the resolution specified in theResolutionToOverwriteproperty.ORM_No: Don’t overwrite the image resolution.
ResolutionToOverwrite
| Key | Type | Default |
|---|---|---|
ResolutionToOverwrite | Integer | 0 |
Specifies the resolution value in DPI to overwrite the image resolution. This property is only used when OverwriteResolutionMode is set to ORM_Manual.
The default value is 0. You must set the desired resolution value when using ORM_Manual; otherwise, an error will occur.
Binarization and color
DiscardColorImage
| Key | Type | Default |
|---|---|---|
DiscardColorImage | Boolean | false |
When set to true, the Pdftools OCR Service leaves only the black-and-white plane in the prepared image. In this case, image binarization is performed during image preprocessing.
UseFastBinarization
| Key | Type | Default |
|---|---|---|
UseFastBinarization | Boolean | false |
If this property is set to true, the Pdftools OCR Service uses algorithms for fast image binarization. This speeds up binarization, but quality may deteriorate.
Binarization is performed either during preprocessing (if DiscardColorImage is true), or later when a black-and-white image is necessary.
Page splitting
SplitType
| Key | Type | Default |
|---|---|---|
SplitType | PageSplitTypeEnum | PST_None |
Specifies the parameters of page splitting.
PageSplitTypeEnum
PST_None: Don’t split the page.PST_DoublePageSplit: Splits a double-page spread into two separate pages.PST_BusinessCardSplit: Splits a page containing multiple business cards into individual cards.