[RecognizerParams] INI file section
The [RecognizerParams] INI file section includes settings for controlling language, text types, performance, and fine-tuning options used by the Pdftools OCR Service during text recognition.
Main settings
TextLanguage
| Key | Type | Default |
|---|---|---|
TextLanguage | TextLanguage | English |
The languages used for text recognition are separated by commas. The supported languages are listed in the Supported languages page.
LanguageDetectionMode
| Key | Type | Default |
|---|---|---|
LanguageDetectionMode | ThreeStatePropertyValueEnum | TSPV_Auto |
Controls automatic language detection.
With autodetection enabled, Pdftools OCR Service identifies the language of each word from the languages specified in the TextLanguage property. This is useful for documents whose language you don’t know in advance. If all specified languages are present in the document, autodetection isn’t needed—set this property to TSPV_No to turn it off.
ThreeStatePropertyValueEnum
TSPV_Auto: Automatically determine if this processing mode should be used, depending on the situation (image characteristics, etc.).TSPV_No: The processing mode in question will not be used.TSPV_Yes: The processing mode in question will be used.
TextTypes
| Key | Type | Default |
|---|---|---|
TextTypes | TextTypeEnum | TT_Normal |
Set this property to one or more TextTypeEnum values combined with the OR operator to specify which text types to recognize. For example, TT_Normal | TT_Index limits recognition to standard typographic text and ZIP-code-style digits.
- If this property is equal to any combination of
TT_Matrix,TT_Typewriter,TT_OCR_A, andTT_OCR_B, italic fonts and superscript/subscript aren’t recognized, regardless of the values of theProhibitItalic,ProhibitSubscriptandProhibitSuperscriptproperties. - If this property is
TT_Handwritten, the image orientation cannot be corrected.
TextTypeEnum
TT_Gothic: Text printed in Gothic (Fraktur) typeface.TT_Handwritten: Handwritten and handprinted text.noteAutomatic analysis isn’t available for handwritten text. Set the coordinates of blocks containing handwritten text manually.
TT_Index: A digit-only character set for text written in ZIP-code style.TT_Matrix: Text printed on a dot-matrix printer.TT_MICR_CMC7: A character set for digits and the letters A–E in the CMC-7 Magnetic Ink Character Recognition (MICR) barcode font. Only supported for Latin-based languages.TT_MICR_E13B: A character set for digits and the symbols A–D printed in magnetic ink (E13B font), commonly used on personal checks and banking documents. Only supported for Latin-based languages.TT_Normal: Standard typographic text. This is the default text type.TT_OCR_A: A monospaced font designed for optical character recognition, commonly used in banking and credit card processing.TT_OCR_B: A font optimized for optical character recognition.TT_Receipt: Optimized for sales receipts and invoices. Rather than targeting a specific font, this type signals that the input may contain low-quality text in monospaced or normal font.TT_Typewriter: Text typed on a typewriter.
DetectTextTypesIndependently
| Key | Type | Default |
|---|---|---|
DetectTextTypesIndependently | Boolean | false |
When enabled, Pdftools OCR Service determines the text type for each block independently. This helps with documents containing small blocks of different text types, though it may slightly slow down processing.
Recognition speed
Mode
| Key | Type | Default |
|---|---|---|
Mode | RecognitionModeEnum | RM_Normal |
Sets the recognition mode, which controls the balance between speed and accuracy during processing.
Built-in patterns are always used for the accurate mode. To disable using the built-in patterns, switch to the normal mode (RM_Normal).
RecognitionModeEnum
RM_Fast: Provides maximum recognition speed with a moderately increased error rate. On high-quality text, this typically results in 1–2 errors per page.RM_Normal: An intermediate mode between fast and accurate. Provides satisfying recognition results on noisy images or documents with complex layouts.RM_Accurate: Provides maximum accuracy on poor-quality documents and images using advanced neural network technologies. Slower than the other modes.
Fine tuning
LowResolutionMode
| Key | Type | Default |
|---|---|---|
LowResolutionMode | Boolean | false |
Enables recognition of text from low-resolution images. Use this for faxes, small prints, or documents with poor print quality.
OneLinePerBlock
| Key | Type | Default |
|---|---|---|
OneLinePerBlock | Boolean | false |
When set to true, Pdftools OCR Service treats each text block as containing a single line of text.
OneWordPerLine
| Key | Type | Default |
|---|---|---|
OneWordPerLine | Boolean | false |
When set to true, each line of text is treated as a single word during recognition.
ProhibitItalic
| Key | Type | Default |
|---|---|---|
ProhibitItalic | Boolean | false |
When set to true, Pdftools OCR Service does not recognize letters printed with an italic-style font. It’s useful when a text with presumably no italic letters is recognized, in which case it may speed up recognition.
If there are any italic letters on the image, and this property is true, these letters are recognized incorrectly.
ProhibitSubscript
| Key | Type | Default |
|---|---|---|
ProhibitSubscript | Boolean | false |
When set to true, Pdftools OCR Service does not recognize subscript letters. It’s useful when a text with presumably no subscripts is recognized, in which case it may speed up recognition. If there are any subscript letters on the image, and this property is true, these letters are recognized incorrectly.
ProhibitSuperscript
| Key | Type | Default |
|---|---|---|
ProhibitSuperscript | Boolean | false |
When set to true, Pdftools OCR Service does not recognize superscript letters. It’s useful when a text with presumably no superscripts is recognized, in which case it may speed up recognition. If there are any superscript letters on the image, and this property is true, these letters are recognized incorrectly.
ProhibitSmallCaps
| Key | Type | Default |
|---|---|---|
ProhibitSmallCaps | Boolean | false |
When set to true, Pdftools OCR Service does not recognize small capital letters.
ProhibitHyphenation
| Key | Type | Default |
|---|---|---|
ProhibitHyphenation | Boolean | false |
When set to true, prohibits recognition of hyphenation from line to line. It’s useful when a text with presumably no hyphenations is recognized, in which case it may speed up recognition.
If there are any hyphenations in the recognized block, and this property is true, the hyphenated words are recognized incorrectly.
ProhibitInterblockHyphenation
| Key | Type | Default |
|---|---|---|
ProhibitInterblockHyphenation | Boolean | false |
When set to true, Pdftools OCR Service presumes that text from one block can’t be carried over to the next block.
CaseRecognitionMode
| Key | Type | Default |
|---|---|---|
CaseRecognitionMode | CaseRecognitionModeEnum | CRM_AutoCase |
Controls how letter case (uppercase/lowercase) is handled in the recognized output.
CaseRecognitionModeEnum
CRM_AutoCase: Automatically detects the case of letters and keeps it in the output text.CRM_CapitalCase: Sets the recognized text to capitals.CRM_SmallCase: Sets the recognized text to lowercase.
Handprint recognition
FieldMarkingType
| Key | Type | Default |
|---|---|---|
FieldMarkingType | FieldMarkingTypeEnum | FMT_SimpleText |
Defines the type of field marking around handprinted characters, such as underlines, frames, or boxes. This property applies only to handprint recognition.
For correct handprint recognition, use the CellsCount property that allows
you to set the number of character cells for a recognized block.
FieldMarkingTypeEnum
FMT_CharBoxSeries: Each character is in a separate box.FMT_CombInFrame: Characters are in a comb layout that also serves as the bottom line of a frame.FMT_GrayBoxes: Characters are in white fields on a gray background.FMT_PartitionedFrame: Characters are in a frame divided by vertical lines.FMT_SimpleComb: Characters are in a comb layout.FMT_SimpleText: Plain text with no marking.FMT_TextInFrame: The text is enclosed in a frame.FMT_UnderlinedText: The text is underlined.
CellsCount
| Key | Type | Default |
|---|---|---|
CellsCount | Integer | 1 |
Sets the number of character cells in a handprint block. This property applies only to FieldMarkingType values that divide text into individual cells.
The default is 1, but you should set the correct value for your document to get accurate recognition results.
User patterns
UseBuiltInPatterns
| Key | Type | Default |
|---|---|---|
UseBuiltInPatterns | Boolean | true |
When set to true, Pdftools OCR Service uses its own built-in patterns for recognition.
Patterns are files that establish a relationship between the character image and the character itself. Set this property to false when you don’t want to use standard Pdftools OCR Service patterns for character recognition, but user patterns only. This can be useful for recognizing text typed with decorative or nonstandard fonts.
In this case, don’t use Pdftools OCR Service built-in patterns but instead use your own user-defined patterns trained for these fonts.
A path to a user-defined pattern file is stored in the UserPatternsFile property. If the UserPatternsFile property is empty, the UseBuiltInPatterns property is ignored.
You can set this property to false when using the normal and fast recognition modes. You can’t prohibit using the built-in patterns for the accurate mode.
UserPatternsFile
| Key | Type | Default |
|---|---|---|
UserPatternsFile | String | "" |
Contains the full path to a file with the user pattern used for recognition. If the value of this property isn’t empty, information from the user pattern file is used during recognition.
If the UseBuiltInPatterns property is false, meaning that standard Pdftools OCR Service patterns aren’t used during recognition, this property should contain a path to a user-defined pattern file, as only information stored in it is used.