Custom profiles
A custom profile is defined by a configuration file set up like an INI file. It consists of sections and entries in each section. A custom profile overrides predefined profiles and lets you fine-tune image preprocessing.
The following code is an example of a custom profile INI configuration file:
[PrepareImageMode]
DiscardColorImage = false
[RecognizerParams]
BalancedMode = false
TextLanguage = English,German
To configure a custom profile for specific OCR processing behavior, follow these steps:
- Create a custom profile INI file.
- In the Conversion Service Configurator, go to Workflows & Profiles.
- Click the pen icon next to the workflow profile you want to edit.
- Navigate to the OCR Settings section.
- Next to Engine, click the pen icon in the Pdftools OCR Service (3H Legacy Compatible) section.
- In the Parameters input field, include a path to your INI file, for example:
Profile="C:\ocr\profiles\document_conversion_high_accuracy.ini"
- After editing your configuration, click Apply.
The INI file must be on the same machine as the Pdftools OCR Service Manager node. If both Profile
and PredefinedProfile
are set, the custom Profile
overrides the predefined profile.
Profile file format
Custom profiles must follow the INI format, with keys grouped into sections. Each section corresponds to a specific stage of the OCR pipeline. Profiles are passed to the engine using the Profile
parameter, which should point to the full path of the file.
Profile file structure
A profile consists of multiple sections, such as:
[PrepareImageMode]
[PagePreprocessingParams]
[DocumentProcessingParams]
[RecognizerParams]
[SynthesisParamsForPage]
Each section contains key-value pairs that configure a particular aspect of the engine.
How profiles are used
- Custom profiles are passed to the engine using the
Profile
key. - The custom profile takes precedence if both the
Profile
andPredefinedProfile
are set. - The OCR engine returns an error if a profile file is missing or contains syntax errors.
If the custom profile file can’t be found or includes syntax errors, the Pdftools OCR Service returns an error. For the file to be found, it has to exist on a path reachable by the Pdftools OCR Service Manager.
Tips for writing effective profiles
- Always validate your INI file encoding (UTF-8 without BOM).
- Build incrementally: start simple, then add new sections one at a time.
- Avoid enabling modules if you don’t need them (for example,
DetectBarcodes
unless required). - Use language combinations carefully. More languages increase processing time.
- Prefer 300 DPI images for optimal OCR accuracy.
If the profile file is invalid, missing, or misformatted, the Pdftools OCR Service will raise an error and abort processing.
Key INI file sections and their purpose
The following table includes links to the references of specific INI file sections:
INI file section | Description |
---|---|
[PrepareImageMode] | Configure image preprocessing: rotation, skew, resolution, contrast, binarization, compression. |
[ImageProcessingParams] | Adjust rotation, mirroring, and color inversion for image blocks. |
[DocumentProcessingParams] | Enable or disable document synthesis. |
[PageProcessingParams] | Configure preprocessing, layout analysis, and recognition. |
[PagePreprocessingParams] | Set options for correcting orientation, skew, shadows, geometry, and resolution. |
[PageAnalysisParams] | Control layout analysis: detect tables, images, barcodes, and structures. |
[TableAnalysisParams] | Configure how table structures are analyzed and interpreted. |
[BarcodeParams] | Detect, decode, and configure interpretation of barcode types. |
[ObjectsExtractionParams] | Manage object extraction, noise cleanup, and embedded text detection. |
[OrientationDetectionParams] | Set rules for detecting and restricting page rotation. |
[RecognizerParams] | Define OCR language, text types, recognition speed, and fine-tuning. |
[SynthesisParamsForPage] | Configure page-level layout synthesis and formatting detection. |
[SynthesisParamsForDocument] | Manage document-level synthesis, structure, formatting, and memory usage. |
[FontFormattingDetectionParams] | Detect bold, italic, font size, spacing, and other typographic features. |