Custom profiles
A custom profile is defined by a configuration file set up like an INI file. It consists of sections and entries in each section. A custom profile overrides predefined profiles and lets you fine-tune image preprocessing.
The following code is an example of a custom profile INI configuration file:
[PrepareImageMode]
DiscardColorImage = false
[RecognizerParams]
BalancedMode = false
TextLanguage = English,German
The custom profile is provided using the parameter profile
that accepts the path to an INI configuration file. Ensure the configuration file resides at a location accessible to the OCR engine. For details on profile structure and how to specify the path, review Profiles in the Engine parameters reference.
Profile file format
Custom profiles must follow the INI format, with keys grouped into sections. Each section corresponds to a specific stage of the OCR pipeline. Profiles are passed to the engine using the Profile
parameter, which should point to the full path of the file.
Profile file structure
A profile consists of multiple sections, such as:
[PrepareImageMode]
[PagePreprocessingParams]
[DocumentProcessingParams]
[RecognizerParams]
[SynthesisParamsForPage]
Each section contains key-value pairs that configure a particular aspect of the engine.
How profiles are used
- Custom profiles are passed to the engine using the
Profile
key. - The custom profile takes precedence if both the
Profile
andPredefinedProfile
are set. - The OCR engine returns an error if a profile file is missing or contains syntax errors.
If the custom profile file can’t be found or includes syntax errors, the Pdftools OCR Service returns an error. For the file to be found, it has to exist on a path reachable by the Pdftools OCR Service Manager.
Tips for writing effective profiles
- Always validate your INI file encoding (UTF-8 without BOM).
- Build incrementally: start simple, then add new sections one at a time.
- Avoid enabling modules if you don’t need them (for example,
DetectBarcodes
unless required). - Use language combinations carefully. More languages increase processing time.
- Prefer 300 DPI images for optimal OCR accuracy.
If the profile file is invalid, missing, or misformatted, the Pdftools OCR Service will raise an error and abort processing.
Key INI file sections and their purpose
The following table includes links to the references of specific INI file sections:
INI file section | Description |
---|---|
[PrepareImageMode] | Configure image preprocessing: rotation, skew, resolution, contrast, binarization, compression. |
[ImageProcessingParams] | Adjust rotation, mirroring, and color inversion for image blocks. |
[DocumentProcessingParams] | Enable or disable document synthesis. |
[PageProcessingParams] | Configure preprocessing, layout analysis, and recognition. |
[PagePreprocessingParams] | Set options for correcting orientation, skew, shadows, geometry, and resolution. |
[PageAnalysisParams] | Control layout analysis: detect tables, images, barcodes, and structures. |
[TableAnalysisParams] | Configure how table structures are analyzed and interpreted. |
[BarcodeParams] | Detect, decode, and configure interpretation of barcode types. |
[ObjectsExtractionParams] | Manage object extraction, noise cleanup, and embedded text detection. |
[OrientationDetectionParams] | Set rules for detecting and restricting page rotation. |
[RecognizerParams] | Define OCR language, text types, recognition speed, and fine-tuning. |
[SynthesisParamsForPage] | Configure page-level layout synthesis and formatting detection. |
[SynthesisParamsForDocument] | Manage document-level synthesis, structure, formatting, and memory usage. |
[FontFormattingDetectionParams] | Detect bold, italic, font size, spacing, and other typographic features. |