Skip to main content
Version: Version 1.1.0

[RecognizerParams] INI file section

The [RecognizerParams] INI file section includes settings for controlling language, text types, performance, and fine-tuning options used by the Pdftools OCR Service during text recognition.


Main settings

TextLanguage

KeyTypeDefault
TextLanguageTextLanguageEnglish

The languages used for text recognition are separated by commas. The supported languages are listed in the Supported languages page.


LanguageDetectionMode

KeyTypeDefault
LanguageDetectionModeThreeStatePropertyValueEnumTSPV_Auto

Controls automatic language detection. With autodetection enabled, Pdftools OCR Service identifies the language of each word from the languages specified in the TextLanguage property. This is useful for documents whose language you don’t know in advance. If all specified languages are present in the document, autodetection isn’t needed—set this property to TSPV_No to turn it off.

ThreeStatePropertyValueEnum

  • TSPV_Auto: Automatically determine if this processing mode should be used, depending on the situation (image characteristics, etc.).
  • TSPV_No: The processing mode in question will not be used.
  • TSPV_Yes: The processing mode in question will be used.

TextTypes

KeyTypeDefault
TextTypesTextTypeEnumTT_Normal

Set this property to one or more TextTypeEnum values combined with the OR operator to specify which text types to recognize. For example, TT_Normal | TT_Index limits recognition to standard typographic text and ZIP-code-style digits.

info
  • If this property is equal to any combination of TT_Matrix, TT_Typewriter, TT_OCR_A, and TT_OCR_B, italic fonts and superscript/subscript aren’t recognized, regardless of the values of the ProhibitItalic, ProhibitSubscript and ProhibitSuperscript properties.
  • If this property is TT_Handwritten, the image orientation cannot be corrected.

TextTypeEnum

  • TT_Gothic: Text printed in Gothic (Fraktur) typeface.
  • TT_Handwritten: Handwritten and handprinted text.
    note

    Automatic analysis isn’t available for handwritten text. Set the coordinates of blocks containing handwritten text manually.

  • TT_Index: A digit-only character set for text written in ZIP-code style.
  • TT_Matrix: Text printed on a dot-matrix printer.
  • TT_MICR_CMC7: A character set for digits and the letters A–E in the CMC-7 Magnetic Ink Character Recognition (MICR) barcode font. Only supported for Latin-based languages.
  • TT_MICR_E13B: A character set for digits and the symbols A–D printed in magnetic ink (E13B font), commonly used on personal checks and banking documents. Only supported for Latin-based languages.
  • TT_Normal: Standard typographic text. This is the default text type.
  • TT_OCR_A: A monospaced font designed for optical character recognition, commonly used in banking and credit card processing.
  • TT_OCR_B: A font optimized for optical character recognition.
  • TT_Receipt: Optimized for sales receipts and invoices. Rather than targeting a specific font, this type signals that the input may contain low-quality text in monospaced or normal font.
  • TT_Typewriter: Text typed on a typewriter.

DetectTextTypesIndependently

KeyTypeDefault
DetectTextTypesIndependentlyBooleanfalse

When enabled, Pdftools OCR Service determines the text type for each block independently. This helps with documents containing small blocks of different text types, though it may slightly slow down processing.


Recognition speed

Mode

KeyTypeDefault
ModeRecognitionModeEnumRM_Normal

Sets the recognition mode, which controls the balance between speed and accuracy during processing.

info

Built-in patterns are always used for the accurate mode. To disable using the built-in patterns, switch to the normal mode (RM_Normal).

RecognitionModeEnum

  • RM_Fast: Provides maximum recognition speed with a moderately increased error rate. On high-quality text, this typically results in 1–2 errors per page.
  • RM_Normal: An intermediate mode between fast and accurate. Provides satisfying recognition results on noisy images or documents with complex layouts.
  • RM_Accurate: Provides maximum accuracy on poor-quality documents and images using advanced neural network technologies. Slower than the other modes.

Fine tuning

LowResolutionMode

KeyTypeDefault
LowResolutionModeBooleanfalse

Enables recognition of text from low-resolution images. Use this for faxes, small prints, or documents with poor print quality.


OneLinePerBlock

KeyTypeDefault
OneLinePerBlockBooleanfalse

When set to true, Pdftools OCR Service treats each text block as containing a single line of text.


OneWordPerLine

KeyTypeDefault
OneWordPerLineBooleanfalse

When set to true, each line of text is treated as a single word during recognition.


ProhibitItalic

KeyTypeDefault
ProhibitItalicBooleanfalse

When set to true, Pdftools OCR Service does not recognize letters printed with an italic-style font. It’s useful when a text with presumably no italic letters is recognized, in which case it may speed up recognition. If there are any italic letters on the image, and this property is true, these letters are recognized incorrectly.


ProhibitSubscript

KeyTypeDefault
ProhibitSubscriptBooleanfalse

When set to true, Pdftools OCR Service does not recognize subscript letters. It’s useful when a text with presumably no subscripts is recognized, in which case it may speed up recognition. If there are any subscript letters on the image, and this property is true, these letters are recognized incorrectly.


ProhibitSuperscript

KeyTypeDefault
ProhibitSuperscriptBooleanfalse

When set to true, Pdftools OCR Service does not recognize superscript letters. It’s useful when a text with presumably no superscripts is recognized, in which case it may speed up recognition. If there are any superscript letters on the image, and this property is true, these letters are recognized incorrectly.


ProhibitSmallCaps

KeyTypeDefault
ProhibitSmallCapsBooleanfalse

When set to true, Pdftools OCR Service does not recognize small capital letters.


ProhibitHyphenation

KeyTypeDefault
ProhibitHyphenationBooleanfalse

When set to true, prohibits recognition of hyphenation from line to line. It’s useful when a text with presumably no hyphenations is recognized, in which case it may speed up recognition. If there are any hyphenations in the recognized block, and this property is true, the hyphenated words are recognized incorrectly.


ProhibitInterblockHyphenation

KeyTypeDefault
ProhibitInterblockHyphenationBooleanfalse

When set to true, Pdftools OCR Service presumes that text from one block can’t be carried over to the next block.


CaseRecognitionMode

KeyTypeDefault
CaseRecognitionModeCaseRecognitionModeEnumCRM_AutoCase

Controls how letter case (uppercase/lowercase) is handled in the recognized output.

CaseRecognitionModeEnum

  • CRM_AutoCase: Automatically detects the case of letters and keeps it in the output text.
  • CRM_CapitalCase: Sets the recognized text to capitals.
  • CRM_SmallCase: Sets the recognized text to lowercase.

Handprint recognition

FieldMarkingType

KeyTypeDefault
FieldMarkingTypeFieldMarkingTypeEnumFMT_SimpleText

Defines the type of field marking around handprinted characters, such as underlines, frames, or boxes. This property applies only to handprint recognition.

info

For correct handprint recognition, use the CellsCount property that allows you to set the number of character cells for a recognized block.

FieldMarkingTypeEnum

  • FMT_CharBoxSeries: Each character is in a separate box.
  • FMT_CombInFrame: Characters are in a comb layout that also serves as the bottom line of a frame.
  • FMT_GrayBoxes: Characters are in white fields on a gray background.
  • FMT_PartitionedFrame: Characters are in a frame divided by vertical lines.
  • FMT_SimpleComb: Characters are in a comb layout.
  • FMT_SimpleText: Plain text with no marking.
  • FMT_TextInFrame: The text is enclosed in a frame.
  • FMT_UnderlinedText: The text is underlined.

CellsCount

KeyTypeDefault
CellsCountInteger1

Sets the number of character cells in a handprint block. This property applies only to FieldMarkingType values that divide text into individual cells. The default is 1, but you should set the correct value for your document to get accurate recognition results.


User patterns

UseBuiltInPatterns

KeyTypeDefault
UseBuiltInPatternsBooleantrue

When set to true, Pdftools OCR Service uses its own built-in patterns for recognition. Patterns are files that establish a relationship between the character image and the character itself. Set this property to false when you don’t want to use standard Pdftools OCR Service patterns for character recognition, but user patterns only. This can be useful for recognizing text typed with decorative or nonstandard fonts. In this case, don’t use Pdftools OCR Service built-in patterns but instead use your own user-defined patterns trained for these fonts. A path to a user-defined pattern file is stored in the UserPatternsFile property. If the UserPatternsFile property is empty, the UseBuiltInPatterns property is ignored.

info

You can set this property to false when using the normal and fast recognition modes. You can’t prohibit using the built-in patterns for the accurate mode.


UserPatternsFile

KeyTypeDefault
UserPatternsFileString""

Contains the full path to a file with the user pattern used for recognition. If the value of this property isn’t empty, information from the user pattern file is used during recognition. If the UseBuiltInPatterns property is false, meaning that standard Pdftools OCR Service patterns aren’t used during recognition, this property should contain a path to a user-defined pattern file, as only information stored in it is used.