OCR a PDF document

Apply OCR to a PDF document to make scanned content searchable and text extractable. The Pdftools SDK analyzes the document with an OCR engine and adds an invisible, selectable text layer while preserving the visual appearance.

Steps to OCR a document:

Create the OCR engine
Open the input document
Configure OCR options
Process the document
Full example

Before you begin

Initialize the Pdftools SDK license.
Install and start Pdftools OCR Service. The SDK connects to the OCR Service over HTTP for text recognition. The default endpoint is http://localhost:7982/.

Create the OCR engine

Create an Engine instance by passing the engine name and connection parameters. The only supported engine is service, which connects to a running Pdftools OCR Service instance over HTTP. Specify the engine name followed by @ and the service URL. For example, "service@http://localhost:7982/" connects to an OCR Service at that URL. To connect to multiple instances, separate URLs with a semicolon (for example, "service@http://host1:7982/;http://host2:7982/").

After creating the engine, set the recognition languages as a comma-separated string (for example, "German,English"). You can also set engine-specific parameters using the Parameters property as semicolon-separated key-value pairs (for example, "PredefinedProfile=Default" or "Profile=/path/to/custom-profile.ini").

The engine can be reused across multiple documents. However, each Engine instance must only be used by one thread at a time.

.NET
Java
Python
C

// Create the OCR engine
using var engine = Engine.Create(ocrEngineName);

// Set the language(s) for OCR recognition (e.g. "German,English")
engine.Languages = language;

// Create the OCR engine
Engine engine = Engine.create(ocrEngineName);

// Set the language(s) for OCR recognition (e.g. "German,English")
engine.setLanguages(language);

# Create the OCR engine
engine = Engine.create(ocr_engine_name)

# Set the language(s) for OCR recognition (e.g. "German,English")
engine.languages = language

// Create the OCR engine
pEngine = PdfToolsOcr_Engine_Create(szOcrEngineName);

// Set the language(s) for OCR recognition (e.g. "German,English")
PdfToolsOcr_Engine_SetLanguages(pEngine, szLanguage);

Open the input document

Load the input PDF from the file system into a read-only Document.

.NET
Java
Python
C

// Open input document
using var inStr = File.OpenRead(inPath);
using var inDoc = Document.Open(inStr);

// Open input document
FileStream inStr = new FileStream(inPath, FileStream.Mode.READ_ONLY);
Document inDoc = Document.open(inStr);

# Open input document
in_stream = io.FileIO(input_path, 'rb')
input_document = Document.open(in_stream)

// Open input document
pInStream = _tfopen(szInPath, _T("rb"));
TPdfToolsSys_StreamDescriptor inDesc;
PdfToolsSysCreateFILEStreamDescriptor(&inDesc, pInStream, 0);
pInDoc = PdfToolsPdf_Document_Open(&inDesc, _T(""));

Configure OCR options

Create an OcrOptions object and configure its three sub-objects: image options, text options, and page options. Each dimension controls a different aspect of OCR processing.

Image options

Image options control how scanned images within the PDF are processed. Set the Mode property to determine which images to OCR:

UpdateText: Process only images without existing OCR text. Recommended for most scanned documents.
ReplaceText: Re-OCR all images, replacing any existing text layer. Use this when the existing OCR results are poor.
RemoveText: Remove existing OCR text without re-processing. No OCR engine is required.
IfNoText: Process images only if the entire document contains no text at all.

Additional image options:

RotateScan: Automatically detect and correct page rotation.
DeskewScan: Straighten skewed scans.
RemoveOnlyInvisibleOcrText: When using ReplaceText or RemoveText, only affect invisible OCR text (text rendering mode 3). Visible text that was placed manually is preserved.

.NET
Java
Python
C

var options = new OcrOptions();

// Configure image OCR: recognize text from scanned images
options.ImageOptions.Mode = ImageProcessingMode.UpdateText;
options.ImageOptions.RemoveOnlyInvisibleOcrText = true;
options.ImageOptions.DeskewScan = true;
options.ImageOptions.RotateScan = true;

OcrOptions options = new OcrOptions();

// Configure image OCR: recognize text from scanned images
options.getImageOptions().setMode(ImageProcessingMode.UPDATE_TEXT);
options.getImageOptions().setRemoveOnlyInvisibleOcrText(true);
options.getImageOptions().setDeskewScan(true);
options.getImageOptions().setRotateScan(true);

options = OcrOptions()

# Configure image OCR: recognize text from scanned images
options.image_options.mode = ImageProcessingMode.UPDATE_TEXT
options.image_options.remove_only_invisible_ocr_text = True
options.image_options.deskew_scan = True
options.image_options.rotate_scan = True

pOptions = PdfToolsOcr_OcrOptions_New();

// Configure image OCR: recognize text from scanned images
pImageOptions = PdfToolsOcr_OcrOptions_GetImageOptions(pOptions);
PdfToolsOcr_ImageOptions_SetMode(pImageOptions, ePdfToolsOcr_ImageProcessingMode_UpdateText);
PdfToolsOcr_ImageOptions_SetRemoveOnlyInvisibleOcrText(pImageOptions, TRUE);
PdfToolsOcr_ImageOptions_SetDeskewScan(pImageOptions, TRUE);
PdfToolsOcr_ImageOptions_SetRotateScan(pImageOptions, TRUE);

Text options

Text options control how non-extractable text in the PDF is processed. Some fonts lack proper Unicode mappings, which prevents text from being copied or searched correctly.

Update: Fix only text with missing or incorrect Unicode mappings. Recommended for most documents.
Replace: Reprocess all text, even text that already has valid Unicode mappings.

Additional text options:

SkipMode: Skip specific font types during text processing. Values can be combined. Available flags: KnownSymbolic (skip symbolic fonts such as ZapfDingbats and Wingdings) and PrivateUseArea (skip text with Unicode Private Use Area code points).
UnicodeSource: Specify additional sources for Unicode mapping. Values can be combined. Available flags: InstalledFont (look up Unicode values from system-installed fonts), KnownSymbolicPua (use Private Use Area values for known symbolic fonts), and FallbackAllPua (use Private Use Area values as a fallback for all characters).

.NET
Java
Python
C

// Configure text OCR: update non-extractable text with correct Unicode
options.TextOptions.Mode = TextProcessingMode.Update;
options.TextOptions.SkipMode = TextSkipMode.KnownSymbolic;
options.TextOptions.UnicodeSource = UnicodeSource.InstalledFont;

// Configure text OCR: update non-extractable text with correct Unicode
options.getTextOptions().setMode(TextProcessingMode.UPDATE);
options.getTextOptions().setSkipMode(EnumSet.of(TextSkipMode.KNOWN_SYMBOLIC));
options.getTextOptions().setUnicodeSource(EnumSet.of(UnicodeSource.INSTALLED_FONT));

# Configure text OCR: update non-extractable text with correct Unicode
options.text_options.mode = TextProcessingMode.UPDATE
options.text_options.skip_mode = TextSkipMode.KNOWN_SYMBOLIC
options.text_options.unicode_source = UnicodeSource.INSTALLED_FONT

// Configure text OCR: update non-extractable text with correct Unicode
pTextOptions = PdfToolsOcr_OcrOptions_GetTextOptions(pOptions);
PdfToolsOcr_TextOptions_SetMode(pTextOptions, ePdfToolsOcr_TextProcessingMode_Update);
PdfToolsOcr_TextOptions_SetSkipMode(pTextOptions, ePdfToolsOcr_TextSkipMode_KnownSymbolic);
PdfToolsOcr_TextOptions_SetUnicodeSource(pTextOptions, ePdfToolsOcr_UnicodeSource_InstalledFont);

Page options

Page options control page-level processing and accessibility tagging.

All: Process all non-empty pages.
IfNoText: Process only pages that have content but no text.
AddResults: Don’t trigger OCR independently, but add page-level results when OCR is triggered by image or text processing.

The Tagging property controls PDF tagging for accessibility:

Auto: Automatically add tagging for scanned or already-tagged documents. Recommended for most workflows.
Update: Always add tagging. A warning is emitted if tagging fails.
None: Don’t add any tagging.

.NET
Java
Python
C

// Configure page OCR: process all pages and add tagging for accessibility
options.PageOptions.Mode = PageProcessingMode.All;
options.PageOptions.Tagging = TaggingMode.Auto;

// Configure page OCR: process all pages and add tagging for accessibility
options.getPageOptions().setMode(PageProcessingMode.ALL);
options.getPageOptions().setTagging(TaggingMode.AUTO);

# Configure page OCR: process all pages and add tagging for accessibility
options.page_options.mode = PageProcessingMode.ALL
options.page_options.tagging = TaggingMode.AUTO

// Configure page OCR: process all pages and add tagging for accessibility
pPageOptions = PdfToolsOcr_OcrOptions_GetPageOptions(pOptions);
PdfToolsOcr_PageOptions_SetMode(pPageOptions, ePdfToolsOcr_PageProcessingMode_All);
PdfToolsOcr_PageOptions_SetTagging(pPageOptions, ePdfToolsOcr_TaggingMode_Auto);

Resolution settings

The OcrOptions object also controls the resolution for OCR processing. Each page’s optimal OCR resolution is determined automatically. If the optimal resolution falls within the configured range, the default resolution is used. A warning is generated if a page’s optimal resolution falls outside the range.

Dpi: Default resolution (default: 300).
MinDpi: Minimum allowed resolution (default: 200).
MaxDpi: Maximum allowed resolution (default: 400).

Embedded files

Set ProcessEmbeddedFiles to true on the OcrOptions object to recursively process PDF files embedded within the input document. By default, embedded files are copied as-is without OCR processing.

Process the document

Create a Processor instance and register a warning handler before calling Process. The processor applies the configured OCR options and writes the result to the output stream.

Warnings provide diagnostic information about each page, such as images with resolution outside the configured range or tagging issues.

Warnings are non-critical. The Pdftools SDK completes processing even when warnings occur. However, depending on your use case, you may need to treat certain warning categories as errors.

Category	Description	When to treat as error
`Ocr`	OCR-related issues such as resolution outside the optimal range	Rarely (usually informational)
`Tagging`	Issues adding tagging or structural information	When producing accessible PDFs or preparing for PDF/A level A
`Text`	Issues making text extractable	When text extraction is the primary goal
`SignedDocument`	Processing removed existing digital signatures	When preserving signatures is important

Signed documents

Processing a signed PDF invalidates all existing digital signatures, which are removed during processing. The SignedDocument warning is generated when this occurs.

.NET
Java
Python
C

// Create the OCR processor and add a warning handler
var processor = new Processor();
processor.Warning += (s, e) =>
{
    Console.WriteLine("- {0}: {1} ({2}{3})",
        e.Category, e.Message, e.Context, e.PageNo > 0 ? " page " + e.PageNo : "");
};

// Create stream for output file
using var outStr = File.Create(outPath);

// Process the document with OCR
using var outDoc = processor.Process(inDoc, engine, outStr, options);

// Create the OCR processor and add a warning handler
Processor processor = new Processor();
processor.addWarningListener(new Processor.WarningListener() {
    @Override
    public void warning(Processor.Warning event) {
        System.out.println(String.format("- %s: %s (%s%s)",
            event.getCategory(), event.getMessage(), event.getContext(),
            event.getPageNo() > 0 ? " page " + event.getPageNo() : ""));
    }
});

// Create stream for output file
FileStream outStr = new FileStream(outPath, FileStream.Mode.READ_WRITE_NEW);

// Process the document with OCR
Document outDoc = processor.process(inDoc, engine, outStr, options, null);

def warning_handler(message: str, category, page_no: int, context: str):
    if page_no > 0:
        print(f"- {category.name}: {message} ({context} page {page_no})")
    else:
        print(f"- {category.name}: {message} ({context})")

# Create the OCR processor and add a warning handler
processor = Processor()
processor.add_warning_handler(warning_handler)

# Create stream for output file
with io.FileIO(output_path, 'wb+') as output_stream:

    # Process the document with OCR
    processor.process(input_document, engine, output_stream, options)

// Create the OCR processor and add a warning handler
pProcessor = PdfToolsOcr_Processor_New();
PdfToolsOcr_Processor_AddWarningHandler(pProcessor, NULL, WarningHandler);

// Create stream for output file
pOutStream = _tfopen(szOutPath, _T("wb+"));
TPdfToolsSys_StreamDescriptor outDesc;
PdfToolsSysCreateFILEStreamDescriptor(&outDesc, pOutStream, 0);

// Process the document with OCR
pOutDoc = PdfToolsOcr_Processor_Process(pProcessor, pInDoc, pEngine, &outDesc, pOptions, NULL);

Handle warnings by category

For workflows where certain warnings are critical, filter warnings by category. This example treats tagging and text warnings as errors:

.NET
Java
Python
C

processor.Warning += (s, e) =>
{
    if (e.Category == WarningCategory.Tagging || e.Category == WarningCategory.Text)
        throw new Exception($"Critical OCR warning: {e.Message}");

    Console.WriteLine($"Warning: {e.Category}: {e.Message}");
};

processor.addWarningListener(new Processor.WarningListener() {
    @Override
    public void warning(Processor.Warning event) {
        if (event.getCategory() == WarningCategory.TAGGING ||
            event.getCategory() == WarningCategory.TEXT)
            throw new RuntimeException("Critical OCR warning: " + event.getMessage());

        System.out.println(String.format("Warning: %s: %s",
            event.getCategory(), event.getMessage()));
    }
});

def warning_handler(message: str, category, page_no: int, context: str):
    if category in (WarningCategory.TAGGING, WarningCategory.TEXT):
        raise Exception(f"Critical OCR warning: {message}")

    print(f"Warning: {category.name}: {message}")

void PDFTOOLS_CALL WarningHandler(void* pContext, const TCHAR* szMessage,
    TPdfToolsOcr_WarningCategory iCategory, int iPageNo, const TCHAR* szContext)
{
    if (iCategory == ePdfToolsOcr_WarningCategory_Tagging ||
        iCategory == ePdfToolsOcr_WarningCategory_Text)
    {
        _tprintf(_T("Critical OCR warning: %s\n"), szMessage);
        // Handle as error (e.g. set error flag)
        return;
    }
    _tprintf(_T("Warning: %d: %s\n"), iCategory, szMessage);
}

Full example

.NET
Java
Python
C

// Create the OCR engine
using var engine = Engine.Create(ocrEngineName);

// Set the language(s) for OCR recognition (e.g. "German,English")
engine.Languages = language;

// Open input document
using var inStr = File.OpenRead(inPath);
using var inDoc = Document.Open(inStr);

// Configure OCR options
var options = new OcrOptions();

// Configure image OCR: recognize text from scanned images
options.ImageOptions.Mode = ImageProcessingMode.UpdateText;
options.ImageOptions.RemoveOnlyInvisibleOcrText = true;
options.ImageOptions.DeskewScan = true;
options.ImageOptions.RotateScan = true;

// Configure text OCR: update non-extractable text with correct Unicode
options.TextOptions.Mode = TextProcessingMode.Update;
options.TextOptions.SkipMode = TextSkipMode.KnownSymbolic;
options.TextOptions.UnicodeSource = UnicodeSource.InstalledFont;

// Configure page OCR: process all pages and add tagging for accessibility
options.PageOptions.Mode = PageProcessingMode.All;
options.PageOptions.Tagging = TaggingMode.Auto;

// Create the OCR processor and add a warning handler
var processor = new Processor();
processor.Warning += (s, e) =>
{
    Console.WriteLine("- {0}: {1} ({2}{3})",
        e.Category, e.Message, e.Context, e.PageNo > 0 ? " page " + e.PageNo : "");
};

// Create stream for output file
using var outStr = File.Create(outPath);

// Process the document with OCR
using var outDoc = processor.Process(inDoc, engine, outStr, options);

// Create the OCR engine
try (Engine engine = Engine.create(ocrEngineName)) {

    // Set the language(s) for OCR recognition (e.g. "German,English")
    engine.setLanguages(language);

    // Open input document
    try (
        FileStream inStr = new FileStream(inPath, FileStream.Mode.READ_ONLY);
        Document inDoc = Document.open(inStr)) {

        // Configure OCR options
        OcrOptions options = new OcrOptions();

        // Configure image OCR: recognize text from scanned images
        options.getImageOptions().setMode(ImageProcessingMode.UPDATE_TEXT);
        options.getImageOptions().setRemoveOnlyInvisibleOcrText(true);
        options.getImageOptions().setDeskewScan(true);
        options.getImageOptions().setRotateScan(true);

        // Configure text OCR: update non-extractable text with correct Unicode
        options.getTextOptions().setMode(TextProcessingMode.UPDATE);
        options.getTextOptions().setSkipMode(EnumSet.of(TextSkipMode.KNOWN_SYMBOLIC));
        options.getTextOptions().setUnicodeSource(EnumSet.of(UnicodeSource.INSTALLED_FONT));

        // Configure page OCR: process all pages and add tagging for accessibility
        options.getPageOptions().setMode(PageProcessingMode.ALL);
        options.getPageOptions().setTagging(TaggingMode.AUTO);

        // Create the OCR processor and add a warning handler
        Processor processor = new Processor();
        processor.addWarningListener(new Processor.WarningListener() {
            @Override
            public void warning(Processor.Warning event) {
                System.out.println(String.format("- %s: %s (%s%s)",
                    event.getCategory(), event.getMessage(), event.getContext(),
                    event.getPageNo() > 0 ? " page " + event.getPageNo() : ""));
            }
        });

        // Create stream for output file
        try (FileStream outStr = new FileStream(outPath, FileStream.Mode.READ_WRITE_NEW)) {

            // Process the document with OCR
            try (Document outDoc = processor.process(inDoc, engine, outStr, options, null)) {
            }
        }
    }
}

def warning_handler(message: str, category, page_no: int, context: str):
    if page_no > 0:
        print(f"- {category.name}: {message} ({context} page {page_no})")
    else:
        print(f"- {category.name}: {message} ({context})")

# Create the OCR engine
with Engine.create(ocr_engine_name) as engine:

    # Set the language(s) for OCR recognition (e.g. "German,English")
    engine.languages = language

    # Open input document
    with io.FileIO(input_path, 'rb') as in_stream:
        with Document.open(in_stream) as input_document:

            # Configure OCR options
            options = OcrOptions()

            # Configure image OCR: recognize text from scanned images
            options.image_options.mode = ImageProcessingMode.UPDATE_TEXT
            options.image_options.remove_only_invisible_ocr_text = True
            options.image_options.deskew_scan = True
            options.image_options.rotate_scan = True

            # Configure text OCR: update non-extractable text with correct Unicode
            options.text_options.mode = TextProcessingMode.UPDATE
            options.text_options.skip_mode = TextSkipMode.KNOWN_SYMBOLIC
            options.text_options.unicode_source = UnicodeSource.INSTALLED_FONT

            # Configure page OCR: process all pages and add tagging for accessibility
            options.page_options.mode = PageProcessingMode.ALL
            options.page_options.tagging = TaggingMode.AUTO

            # Create the OCR processor and add a warning handler
            processor = Processor()
            processor.add_warning_handler(warning_handler)

            # Create stream for output file
            with io.FileIO(output_path, 'wb+') as output_stream:

                # Process the document with OCR
                processor.process(input_document, engine, output_stream, options)

void PDFTOOLS_CALL WarningHandler(void* pContext, const TCHAR* szMessage,
    TPdfToolsOcr_WarningCategory iCategory, int iPageNo, const TCHAR* szContext)
{
    if (iPageNo > 0)
        _tprintf(_T("- %d: %s (%s page %d)\n"), iCategory, szMessage, szContext, iPageNo);
    else
        _tprintf(_T("- %d: %s (%s)\n"), iCategory, szMessage, szContext);
}

// Create the OCR engine
pEngine = PdfToolsOcr_Engine_Create(szOcrEngineName);
GOTO_CLEANUP_IF_NULL_PRINT_ERROR(pEngine, _T("Failed to create OCR engine. %s (ErrorCode: 0x%08x).\n"), szErrorBuff,
                                 PdfTools_GetLastError());

// Set the language(s) for OCR recognition (e.g. "German,English")
GOTO_CLEANUP_IF_FALSE_PRINT_ERROR(PdfToolsOcr_Engine_SetLanguages(pEngine, szLanguage),
                                  _T("Failed to set OCR languages. %s (ErrorCode: 0x%08x).\n"), szErrorBuff,
                                  PdfTools_GetLastError());

// Open input document
pInStream = _tfopen(szInPath, _T("rb"));
GOTO_CLEANUP_IF_NULL_PRINT_ERROR(pInStream, _T("Failed to open the input file \"%s\" for reading.\n"), szInPath);
TPdfToolsSys_StreamDescriptor inDesc;
PdfToolsSysCreateFILEStreamDescriptor(&inDesc, pInStream, 0);
pInDoc = PdfToolsPdf_Document_Open(&inDesc, _T(""));
GOTO_CLEANUP_IF_NULL_PRINT_ERROR(
    pInDoc, _T("Failed to create a document from the input file \"%s\". %s (ErrorCode: 0x%08x).\n"), szInPath,
    szErrorBuff, PdfTools_GetLastError());

// Configure OCR options
pOptions = PdfToolsOcr_OcrOptions_New();
GOTO_CLEANUP_IF_NULL_PRINT_ERROR(pOptions, _T("Failed to create OCR options. %s (ErrorCode: 0x%08x).\n"),
                                 szErrorBuff, PdfTools_GetLastError());

// Configure image OCR: recognize text from scanned images
pImageOptions = PdfToolsOcr_OcrOptions_GetImageOptions(pOptions);
GOTO_CLEANUP_IF_NULL_PRINT_ERROR(pImageOptions, _T("Failed to get image options. %s (ErrorCode: 0x%08x).\n"),
                                 szErrorBuff, PdfTools_GetLastError());
GOTO_CLEANUP_IF_FALSE_PRINT_ERROR(
    PdfToolsOcr_ImageOptions_SetMode(pImageOptions, ePdfToolsOcr_ImageProcessingMode_UpdateText),
    _T("Failed to set image processing mode. %s (ErrorCode: 0x%08x).\n"), szErrorBuff, PdfTools_GetLastError());
GOTO_CLEANUP_IF_FALSE_PRINT_ERROR(PdfToolsOcr_ImageOptions_SetRemoveOnlyInvisibleOcrText(pImageOptions, TRUE),
                                  _T("Failed to set RemoveOnlyInvisibleOcrText. %s (ErrorCode: 0x%08x).\n"),
                                  szErrorBuff, PdfTools_GetLastError());
GOTO_CLEANUP_IF_FALSE_PRINT_ERROR(PdfToolsOcr_ImageOptions_SetDeskewScan(pImageOptions, TRUE),
                                  _T("Failed to set DeskewScan. %s (ErrorCode: 0x%08x).\n"), szErrorBuff,
                                  PdfTools_GetLastError());
GOTO_CLEANUP_IF_FALSE_PRINT_ERROR(PdfToolsOcr_ImageOptions_SetRotateScan(pImageOptions, TRUE),
                                  _T("Failed to set RotateScan. %s (ErrorCode: 0x%08x).\n"), szErrorBuff,
                                  PdfTools_GetLastError());

// Configure text OCR: update non-extractable text with correct Unicode
pTextOptions = PdfToolsOcr_OcrOptions_GetTextOptions(pOptions);
GOTO_CLEANUP_IF_NULL_PRINT_ERROR(pTextOptions, _T("Failed to get text options. %s (ErrorCode: 0x%08x).\n"),
                                 szErrorBuff, PdfTools_GetLastError());
GOTO_CLEANUP_IF_FALSE_PRINT_ERROR(
    PdfToolsOcr_TextOptions_SetMode(pTextOptions, ePdfToolsOcr_TextProcessingMode_Update),
    _T("Failed to set text processing mode. %s (ErrorCode: 0x%08x).\n"), szErrorBuff, PdfTools_GetLastError());
GOTO_CLEANUP_IF_FALSE_PRINT_ERROR(
    PdfToolsOcr_TextOptions_SetSkipMode(pTextOptions, ePdfToolsOcr_TextSkipMode_KnownSymbolic),
    _T("Failed to set text skip mode. %s (ErrorCode: 0x%08x).\n"), szErrorBuff, PdfTools_GetLastError());
GOTO_CLEANUP_IF_FALSE_PRINT_ERROR(
    PdfToolsOcr_TextOptions_SetUnicodeSource(pTextOptions, ePdfToolsOcr_UnicodeSource_InstalledFont),
    _T("Failed to set unicode source. %s (ErrorCode: 0x%08x).\n"), szErrorBuff, PdfTools_GetLastError());

// Configure page OCR: process all pages and add tagging for accessibility
pPageOptions = PdfToolsOcr_OcrOptions_GetPageOptions(pOptions);
GOTO_CLEANUP_IF_NULL_PRINT_ERROR(pPageOptions, _T("Failed to get page options. %s (ErrorCode: 0x%08x).\n"),
                                 szErrorBuff, PdfTools_GetLastError());
GOTO_CLEANUP_IF_FALSE_PRINT_ERROR(
    PdfToolsOcr_PageOptions_SetMode(pPageOptions, ePdfToolsOcr_PageProcessingMode_All),
    _T("Failed to set page processing mode. %s (ErrorCode: 0x%08x).\n"), szErrorBuff, PdfTools_GetLastError());
GOTO_CLEANUP_IF_FALSE_PRINT_ERROR(PdfToolsOcr_PageOptions_SetTagging(pPageOptions, ePdfToolsOcr_TaggingMode_Auto),
                                  _T("Failed to set tagging mode. %s (ErrorCode: 0x%08x).\n"), szErrorBuff,
                                  PdfTools_GetLastError());

// Create the OCR processor and add a warning handler
pProcessor = PdfToolsOcr_Processor_New();
GOTO_CLEANUP_IF_NULL_PRINT_ERROR(pProcessor, _T("Failed to create OCR processor. %s (ErrorCode: 0x%08x).\n"),
                                 szErrorBuff, PdfTools_GetLastError());
GOTO_CLEANUP_IF_FALSE_PRINT_ERROR(PdfToolsOcr_Processor_AddWarningHandler(pProcessor, NULL, WarningHandler),
                                  _T("Failed to add warning handler. %s (ErrorCode: 0x%08x).\n"), szErrorBuff,
                                  PdfTools_GetLastError());

// Create stream for output file
pOutStream = _tfopen(szOutPath, _T("wb+"));
GOTO_CLEANUP_IF_NULL_PRINT_ERROR(pOutStream, _T("Failed to create output file \"%s\" for writing.\n"), szOutPath);
TPdfToolsSys_StreamDescriptor outDesc;
PdfToolsSysCreateFILEStreamDescriptor(&outDesc, pOutStream, 0);

// Process the document with OCR
pOutDoc = PdfToolsOcr_Processor_Process(pProcessor, pInDoc, pEngine, &outDesc, pOptions, NULL);
GOTO_CLEANUP_IF_NULL_PRINT_ERROR(pOutDoc, _T("The processing has failed. %s (ErrorCode: 0x%08x).\n"), szErrorBuff,
                                 PdfTools_GetLastError());

Create the OCR engine​

Open the input document​

Configure OCR options​

Image options​

Text options​

Page options​

Resolution settings​

Embedded files​

Process the document​

Handle warnings by category​

Full example​