Skip to main content
Version: Version 1.1.0

[PrepareImageMode] INI file section

The [PrepareImageMode] configuration controls how images are prepared before OCR processing, including compression, document type classification, image source detection, and free text annotation handling.

info

Many image preprocessing settings that were previously part of [PrepareImageMode] (such as rotation, skew correction, resolution overwriting, binarization, contrast enhancement, and color inversion) have been moved to the [PagePreprocessingParams] section.


CompressImageMode

KeyTypeDefault
CompressImageModeCompressImageModeEnumCIM_Auto

Specifies if the image should be compressed during conversion to the internal format. This applies only to color and gray images; black-and-white images are always compressed with lossless compression.

CompressImageModeEnum

  • CIM_Auto: Automatic mode. Born-digital images (for example, from PDF files) use lossless ZIP compression, while other images use JPEG compression.
  • CIM_Lossless: Use lossless ZIP compression for all images.
  • CIM_MaxCompression: Use JPEG compression for all images.

DocumentType

KeyTypeDefault
DocumentTypeDocumentTypeEnumDT_Auto

Specifies the type of the document on the image. If you know the document type for certain, you can set this property to bypass the document classifier and save processing time.

DocumentTypeEnum

  • DT_Auto: Pdftools OCR Service runs the document classifier to determine the document type automatically.
  • DT_BankCard: The image contains a bank card.
  • DT_Book: The image contains a book page or multiple pages.
  • DT_BusinessCard: The image contains a business card.
  • DT_DiscountCard: The image contains a discount or loyalty card.
  • DT_Document: The image contains a single-page document.
  • DT_Id: The image contains an ID document.
  • DT_NotDocument: The image isn’t a document (for example, a photograph without text).
  • DT_Passport: The image contains a passport double-page spread.
  • DT_PassportPage: The image contains a single passport page.
  • DT_Receipt: The image contains a receipt.
  • DT_TechnicalDrawing: The image contains an engineering diagram or technical drawing.
  • DT_Unknown: Reserved for internal use.

ImageSourceType

KeyTypeDefault
ImageSourceTypeImageSourceTypeEnumIST_Auto

Specifies the image origin. Different image sources (for example, a photo versus a screenshot) require different preprocessing techniques.

ImageSourceTypeEnum

  • IST_Auto: Pdftools OCR Service detects the image origin automatically.
  • IST_Photo: The image is a photograph.
  • IST_Scan: The image is a scanned document.
  • IST_Screenshot: The image is a screenshot.
  • IST_SyntheticImage: The image contains text produced by rasterizing digital fonts. For example, a born-digital image-only PDF document.
  • IST_SyntheticText: The image contains a text layer. For example, a born-digital PDF document with an embedded text layer.

RasterizeFreeText

KeyTypeDefault
RasterizeFreeTextBooleantrue

Specifies whether Free Text annotations from the input PDF document should be retained. When set to true, the Pdftools OCR Service rasterizes free text annotations so they become part of the image content.