OCR a PDF document
Apply OCR to a PDF document to make scanned content searchable and text extractable. The Pdftools SDK analyzes the document with an OCR engine and adds an invisible, selectable text layer while preserving the visual appearance.
Steps to OCR a document:
- Create the OCR engine
- Open the input document
- Configure OCR options
- Process the document
- Full example
- Initialize the Pdftools SDK license.
- Install and start Pdftools OCR Service. The SDK connects to the OCR Service over HTTP for text recognition. The default endpoint is
http://localhost:7982/.
Create the OCR engine
Create an Engine instance by passing the engine name and connection parameters. The only supported engine is service, which connects to a running Pdftools OCR Service instance over HTTP. Specify the engine name followed by @ and the service URL. For example, "service@http://localhost:7982/" connects to an OCR Service at that URL. To connect to multiple instances, separate URLs with a semicolon (for example, "service@http://host1:7982/;http://host2:7982/").
After creating the engine, set the recognition languages as a comma-separated string (for example, "German,English"). You can also set engine-specific parameters using the Parameters property as semicolon-separated key-value pairs (for example, "PredefinedProfile=Default" or "Profile=/path/to/custom-profile.ini").
The engine can be reused across multiple documents. However, each Engine instance must only be used by one thread at a time.
- .NET
- Java
- Python
- C
// Create the OCR engine
using var engine = Engine.Create(ocrEngineName);
// Set the language(s) for OCR recognition (e.g. "German,English")
engine.Languages = language;
// Create the OCR engine
Engine engine = Engine.create(ocrEngineName);
// Set the language(s) for OCR recognition (e.g. "German,English")
engine.setLanguages(language);
# Create the OCR engine
engine = Engine.create(ocr_engine_name)
# Set the language(s) for OCR recognition (e.g. "German,English")
engine.languages = language
// Create the OCR engine
pEngine = PdfToolsOcr_Engine_Create(szOcrEngineName);
// Set the language(s) for OCR recognition (e.g. "German,English")
PdfToolsOcr_Engine_SetLanguages(pEngine, szLanguage);
Open the input document
Load the input PDF from the file system into a read-only Document.
- .NET
- Java
- Python
- C
// Open input document
using var inStr = File.OpenRead(inPath);
using var inDoc = Document.Open(inStr);
// Open input document
FileStream inStr = new FileStream(inPath, FileStream.Mode.READ_ONLY);
Document inDoc = Document.open(inStr);
# Open input document
in_stream = io.FileIO(input_path, 'rb')
input_document = Document.open(in_stream)
// Open input document
pInStream = _tfopen(szInPath, _T("rb"));
TPdfToolsSys_StreamDescriptor inDesc;
PdfToolsSysCreateFILEStreamDescriptor(&inDesc, pInStream, 0);
pInDoc = PdfToolsPdf_Document_Open(&inDesc, _T(""));
Configure OCR options
Create an OcrOptions object and configure its three sub-objects: image options, text options, and page options. Each dimension controls a different aspect of OCR processing.
Image options
Image options control how scanned images within the PDF are processed. Set the Mode property to determine which images to OCR:
UpdateText: Process only images without existing OCR text. Recommended for most scanned documents.ReplaceText: Re-OCR all images, replacing any existing text layer. Use this when the existing OCR results are poor.RemoveText: Remove existing OCR text without re-processing. No OCR engine is required.IfNoText: Process images only if the entire document contains no text at all.
Additional image options:
RotateScan: Automatically detect and correct page rotation.DeskewScan: Straighten skewed scans.RemoveOnlyInvisibleOcrText: When usingReplaceTextorRemoveText, only affect invisible OCR text (text rendering mode 3). Visible text that was placed manually is preserved.
- .NET
- Java
- Python
- C
var options = new OcrOptions();
// Configure image OCR: recognize text from scanned images
options.ImageOptions.Mode = ImageProcessingMode.UpdateText;
options.ImageOptions.RemoveOnlyInvisibleOcrText = true;
options.ImageOptions.DeskewScan = true;
options.ImageOptions.RotateScan = true;
OcrOptions options = new OcrOptions();
// Configure image OCR: recognize text from scanned images
options.getImageOptions().setMode(ImageProcessingMode.UPDATE_TEXT);
options.getImageOptions().setRemoveOnlyInvisibleOcrText(true);
options.getImageOptions().setDeskewScan(true);
options.getImageOptions().setRotateScan(true);
options = OcrOptions()
# Configure image OCR: recognize text from scanned images
options.image_options.mode = ImageProcessingMode.UPDATE_TEXT
options.image_options.remove_only_invisible_ocr_text = True
options.image_options.deskew_scan = True
options.image_options.rotate_scan = True
pOptions = PdfToolsOcr_OcrOptions_New();
// Configure image OCR: recognize text from scanned images
pImageOptions = PdfToolsOcr_OcrOptions_GetImageOptions(pOptions);
PdfToolsOcr_ImageOptions_SetMode(pImageOptions, ePdfToolsOcr_ImageProcessingMode_UpdateText);
PdfToolsOcr_ImageOptions_SetRemoveOnlyInvisibleOcrText(pImageOptions, TRUE);
PdfToolsOcr_ImageOptions_SetDeskewScan(pImageOptions, TRUE);
PdfToolsOcr_ImageOptions_SetRotateScan(pImageOptions, TRUE);
Text options
Text options control how non-extractable text in the PDF is processed. Some fonts lack proper Unicode mappings, which prevents text from being copied or searched correctly.
Update: Fix only text with missing or incorrect Unicode mappings. Recommended for most documents.Replace: Reprocess all text, even text that already has valid Unicode mappings.
Additional text options:
SkipMode: Skip specific font types during text processing. Values can be combined. Available flags:KnownSymbolic(skip symbolic fonts such as ZapfDingbats and Wingdings) andPrivateUseArea(skip text with Unicode Private Use Area code points).UnicodeSource: Specify additional sources for Unicode mapping. Values can be combined. Available flags:InstalledFont(look up Unicode values from system-installed fonts),KnownSymbolicPua(use Private Use Area values for known symbolic fonts), andFallbackAllPua(use Private Use Area values as a fallback for all characters).
- .NET
- Java
- Python
- C
// Configure text OCR: update non-extractable text with correct Unicode
options.TextOptions.Mode = TextProcessingMode.Update;
options.TextOptions.SkipMode = TextSkipMode.KnownSymbolic;
options.TextOptions.UnicodeSource = UnicodeSource.InstalledFont;
// Configure text OCR: update non-extractable text with correct Unicode
options.getTextOptions().setMode(TextProcessingMode.UPDATE);
options.getTextOptions().setSkipMode(EnumSet.of(TextSkipMode.KNOWN_SYMBOLIC));
options.getTextOptions().setUnicodeSource(EnumSet.of(UnicodeSource.INSTALLED_FONT));
# Configure text OCR: update non-extractable text with correct Unicode
options.text_options.mode = TextProcessingMode.UPDATE
options.text_options.skip_mode = TextSkipMode.KNOWN_SYMBOLIC
options.text_options.unicode_source = UnicodeSource.INSTALLED_FONT
// Configure text OCR: update non-extractable text with correct Unicode
pTextOptions = PdfToolsOcr_OcrOptions_GetTextOptions(pOptions);
PdfToolsOcr_TextOptions_SetMode(pTextOptions, ePdfToolsOcr_TextProcessingMode_Update);
PdfToolsOcr_TextOptions_SetSkipMode(pTextOptions, ePdfToolsOcr_TextSkipMode_KnownSymbolic);
PdfToolsOcr_TextOptions_SetUnicodeSource(pTextOptions, ePdfToolsOcr_UnicodeSource_InstalledFont);
Page options
Page options control page-level processing and accessibility tagging.
All: Process all non-empty pages.IfNoText: Process only pages that have content but no text.AddResults: Don’t trigger OCR independently, but add page-level results when OCR is triggered by image or text processing.
The Tagging property controls PDF tagging for accessibility:
Auto: Automatically add tagging for scanned or already-tagged documents. Recommended for most workflows.Update: Always add tagging. A warning is emitted if tagging fails.None: Don’t add any tagging.
- .NET
- Java
- Python
- C
// Configure page OCR: process all pages and add tagging for accessibility
options.PageOptions.Mode = PageProcessingMode.All;
options.PageOptions.Tagging = TaggingMode.Auto;
// Configure page OCR: process all pages and add tagging for accessibility
options.getPageOptions().setMode(PageProcessingMode.ALL);
options.getPageOptions().setTagging(TaggingMode.AUTO);
# Configure page OCR: process all pages and add tagging for accessibility
options.page_options.mode = PageProcessingMode.ALL
options.page_options.tagging = TaggingMode.AUTO
// Configure page OCR: process all pages and add tagging for accessibility
pPageOptions = PdfToolsOcr_OcrOptions_GetPageOptions(pOptions);
PdfToolsOcr_PageOptions_SetMode(pPageOptions, ePdfToolsOcr_PageProcessingMode_All);
PdfToolsOcr_PageOptions_SetTagging(pPageOptions, ePdfToolsOcr_TaggingMode_Auto);
Resolution settings
The OcrOptions object also controls the resolution for OCR processing. Each page’s optimal OCR resolution is determined automatically. If the optimal resolution falls within the configured range, the default resolution is used. A warning is generated if a page’s optimal resolution falls outside the range.
Dpi: Default resolution (default: 300).MinDpi: Minimum allowed resolution (default: 200).MaxDpi: Maximum allowed resolution (default: 400).
Embedded files
Set ProcessEmbeddedFiles to true on the OcrOptions object to recursively process PDF files embedded within the input document. By default, embedded files are copied as-is without OCR processing.
Process the document
Create a Processor instance and register a warning handler before calling Process. The processor applies the configured OCR options and writes the result to the output stream.
Warnings provide diagnostic information about each page, such as images with resolution outside the configured range or tagging issues.
Warnings are non-critical. The Pdftools SDK completes processing even when warnings occur. However, depending on your use case, you may need to treat certain warning categories as errors.
| Category | Description | When to treat as error |
|---|---|---|
Ocr | OCR-related issues such as resolution outside the optimal range | Rarely (usually informational) |
Tagging | Issues adding tagging or structural information | When producing accessible PDFs or preparing for PDF/A level A |
Text | Issues making text extractable | When text extraction is the primary goal |
SignedDocument | Processing removed existing digital signatures | When preserving signatures is important |
Processing a signed PDF invalidates all existing digital signatures, which are removed during processing. The SignedDocument warning is generated when this occurs.
- .NET
- Java
- Python
- C
// Create the OCR processor and add a warning handler
var processor = new Processor();
processor.Warning += (s, e) =>
{
Console.WriteLine("- {0}: {1} ({2}{3})",
e.Category, e.Message, e.Context, e.PageNo > 0 ? " page " + e.PageNo : "");
};
// Create stream for output file
using var outStr = File.Create(outPath);
// Process the document with OCR
using var outDoc = processor.Process(inDoc, engine, outStr, options);
// Create the OCR processor and add a warning handler
Processor processor = new Processor();
processor.addWarningListener(new Processor.WarningListener() {
@Override
public void warning(Processor.Warning event) {
System.out.println(String.format("- %s: %s (%s%s)",
event.getCategory(), event.getMessage(), event.getContext(),
event.getPageNo() > 0 ? " page " + event.getPageNo() : ""));
}
});
// Create stream for output file
FileStream outStr = new FileStream(outPath, FileStream.Mode.READ_WRITE_NEW);
// Process the document with OCR
Document outDoc = processor.process(inDoc, engine, outStr, options, null);
def warning_handler(message: str, category, page_no: int, context: str):
if page_no > 0:
print(f"- {category.name}: {message} ({context} page {page_no})")
else:
print(f"- {category.name}: {message} ({context})")
# Create the OCR processor and add a warning handler
processor = Processor()
processor.add_warning_handler(warning_handler)
# Create stream for output file
with io.FileIO(output_path, 'wb+') as output_stream:
# Process the document with OCR
processor.process(input_document, engine, output_stream, options)
// Create the OCR processor and add a warning handler
pProcessor = PdfToolsOcr_Processor_New();
PdfToolsOcr_Processor_AddWarningHandler(pProcessor, NULL, WarningHandler);
// Create stream for output file
pOutStream = _tfopen(szOutPath, _T("wb+"));
TPdfToolsSys_StreamDescriptor outDesc;
PdfToolsSysCreateFILEStreamDescriptor(&outDesc, pOutStream, 0);
// Process the document with OCR
pOutDoc = PdfToolsOcr_Processor_Process(pProcessor, pInDoc, pEngine, &outDesc, pOptions, NULL);
Handle warnings by category
For workflows where certain warnings are critical, filter warnings by category. This example treats tagging and text warnings as errors:
- .NET
- Java
- Python
- C
processor.Warning += (s, e) =>
{
if (e.Category == WarningCategory.Tagging || e.Category == WarningCategory.Text)
throw new Exception($"Critical OCR warning: {e.Message}");
Console.WriteLine($"Warning: {e.Category}: {e.Message}");
};
processor.addWarningListener(new Processor.WarningListener() {
@Override
public void warning(Processor.Warning event) {
if (event.getCategory() == WarningCategory.TAGGING ||
event.getCategory() == WarningCategory.TEXT)
throw new RuntimeException("Critical OCR warning: " + event.getMessage());
System.out.println(String.format("Warning: %s: %s",
event.getCategory(), event.getMessage()));
}
});
def warning_handler(message: str, category, page_no: int, context: str):
if category in (WarningCategory.TAGGING, WarningCategory.TEXT):
raise Exception(f"Critical OCR warning: {message}")
print(f"Warning: {category.name}: {message}")
void PDFTOOLS_CALL WarningHandler(void* pContext, const TCHAR* szMessage,
TPdfToolsOcr_WarningCategory iCategory, int iPageNo, const TCHAR* szContext)
{
if (iCategory == ePdfToolsOcr_WarningCategory_Tagging ||
iCategory == ePdfToolsOcr_WarningCategory_Text)
{
_tprintf(_T("Critical OCR warning: %s\n"), szMessage);
// Handle as error (e.g. set error flag)
return;
}
_tprintf(_T("Warning: %d: %s\n"), iCategory, szMessage);
}
Full example
- .NET
- Java
- Python
- C
// Create the OCR engine
using var engine = Engine.Create(ocrEngineName);
// Set the language(s) for OCR recognition (e.g. "German,English")
engine.Languages = language;
// Open input document
using var inStr = File.OpenRead(inPath);
using var inDoc = Document.Open(inStr);
// Configure OCR options
var options = new OcrOptions();
// Configure image OCR: recognize text from scanned images
options.ImageOptions.Mode = ImageProcessingMode.UpdateText;
options.ImageOptions.RemoveOnlyInvisibleOcrText = true;
options.ImageOptions.DeskewScan = true;
options.ImageOptions.RotateScan = true;
// Configure text OCR: update non-extractable text with correct Unicode
options.TextOptions.Mode = TextProcessingMode.Update;
options.TextOptions.SkipMode = TextSkipMode.KnownSymbolic;
options.TextOptions.UnicodeSource = UnicodeSource.InstalledFont;
// Configure page OCR: process all pages and add tagging for accessibility
options.PageOptions.Mode = PageProcessingMode.All;
options.PageOptions.Tagging = TaggingMode.Auto;
// Create the OCR processor and add a warning handler
var processor = new Processor();
processor.Warning += (s, e) =>
{
Console.WriteLine("- {0}: {1} ({2}{3})",
e.Category, e.Message, e.Context, e.PageNo > 0 ? " page " + e.PageNo : "");
};
// Create stream for output file
using var outStr = File.Create(outPath);
// Process the document with OCR
using var outDoc = processor.Process(inDoc, engine, outStr, options);
// Create the OCR engine
try (Engine engine = Engine.create(ocrEngineName)) {
// Set the language(s) for OCR recognition (e.g. "German,English")
engine.setLanguages(language);
// Open input document
try (
FileStream inStr = new FileStream(inPath, FileStream.Mode.READ_ONLY);
Document inDoc = Document.open(inStr)) {
// Configure OCR options
OcrOptions options = new OcrOptions();
// Configure image OCR: recognize text from scanned images
options.getImageOptions().setMode(ImageProcessingMode.UPDATE_TEXT);
options.getImageOptions().setRemoveOnlyInvisibleOcrText(true);
options.getImageOptions().setDeskewScan(true);
options.getImageOptions().setRotateScan(true);
// Configure text OCR: update non-extractable text with correct Unicode
options.getTextOptions().setMode(TextProcessingMode.UPDATE);
options.getTextOptions().setSkipMode(EnumSet.of(TextSkipMode.KNOWN_SYMBOLIC));
options.getTextOptions().setUnicodeSource(EnumSet.of(UnicodeSource.INSTALLED_FONT));
// Configure page OCR: process all pages and add tagging for accessibility
options.getPageOptions().setMode(PageProcessingMode.ALL);
options.getPageOptions().setTagging(TaggingMode.AUTO);
// Create the OCR processor and add a warning handler
Processor processor = new Processor();
processor.addWarningListener(new Processor.WarningListener() {
@Override
public void warning(Processor.Warning event) {
System.out.println(String.format("- %s: %s (%s%s)",
event.getCategory(), event.getMessage(), event.getContext(),
event.getPageNo() > 0 ? " page " + event.getPageNo() : ""));
}
});
// Create stream for output file
try (FileStream outStr = new FileStream(outPath, FileStream.Mode.READ_WRITE_NEW)) {
// Process the document with OCR
try (Document outDoc = processor.process(inDoc, engine, outStr, options, null)) {
}
}
}
}
def warning_handler(message: str, category, page_no: int, context: str):
if page_no > 0:
print(f"- {category.name}: {message} ({context} page {page_no})")
else:
print(f"- {category.name}: {message} ({context})")
# Create the OCR engine
with Engine.create(ocr_engine_name) as engine:
# Set the language(s) for OCR recognition (e.g. "German,English")
engine.languages = language
# Open input document
with io.FileIO(input_path, 'rb') as in_stream:
with Document.open(in_stream) as input_document:
# Configure OCR options
options = OcrOptions()
# Configure image OCR: recognize text from scanned images
options.image_options.mode = ImageProcessingMode.UPDATE_TEXT
options.image_options.remove_only_invisible_ocr_text = True
options.image_options.deskew_scan = True
options.image_options.rotate_scan = True
# Configure text OCR: update non-extractable text with correct Unicode
options.text_options.mode = TextProcessingMode.UPDATE
options.text_options.skip_mode = TextSkipMode.KNOWN_SYMBOLIC
options.text_options.unicode_source = UnicodeSource.INSTALLED_FONT
# Configure page OCR: process all pages and add tagging for accessibility
options.page_options.mode = PageProcessingMode.ALL
options.page_options.tagging = TaggingMode.AUTO
# Create the OCR processor and add a warning handler
processor = Processor()
processor.add_warning_handler(warning_handler)
# Create stream for output file
with io.FileIO(output_path, 'wb+') as output_stream:
# Process the document with OCR
processor.process(input_document, engine, output_stream, options)
void PDFTOOLS_CALL WarningHandler(void* pContext, const TCHAR* szMessage,
TPdfToolsOcr_WarningCategory iCategory, int iPageNo, const TCHAR* szContext)
{
if (iPageNo > 0)
_tprintf(_T("- %d: %s (%s page %d)\n"), iCategory, szMessage, szContext, iPageNo);
else
_tprintf(_T("- %d: %s (%s)\n"), iCategory, szMessage, szContext);
}
// Create the OCR engine
pEngine = PdfToolsOcr_Engine_Create(szOcrEngineName);
GOTO_CLEANUP_IF_NULL_PRINT_ERROR(pEngine, _T("Failed to create OCR engine. %s (ErrorCode: 0x%08x).\n"), szErrorBuff,
PdfTools_GetLastError());
// Set the language(s) for OCR recognition (e.g. "German,English")
GOTO_CLEANUP_IF_FALSE_PRINT_ERROR(PdfToolsOcr_Engine_SetLanguages(pEngine, szLanguage),
_T("Failed to set OCR languages. %s (ErrorCode: 0x%08x).\n"), szErrorBuff,
PdfTools_GetLastError());
// Open input document
pInStream = _tfopen(szInPath, _T("rb"));
GOTO_CLEANUP_IF_NULL_PRINT_ERROR(pInStream, _T("Failed to open the input file \"%s\" for reading.\n"), szInPath);
TPdfToolsSys_StreamDescriptor inDesc;
PdfToolsSysCreateFILEStreamDescriptor(&inDesc, pInStream, 0);
pInDoc = PdfToolsPdf_Document_Open(&inDesc, _T(""));
GOTO_CLEANUP_IF_NULL_PRINT_ERROR(
pInDoc, _T("Failed to create a document from the input file \"%s\". %s (ErrorCode: 0x%08x).\n"), szInPath,
szErrorBuff, PdfTools_GetLastError());
// Configure OCR options
pOptions = PdfToolsOcr_OcrOptions_New();
GOTO_CLEANUP_IF_NULL_PRINT_ERROR(pOptions, _T("Failed to create OCR options. %s (ErrorCode: 0x%08x).\n"),
szErrorBuff, PdfTools_GetLastError());
// Configure image OCR: recognize text from scanned images
pImageOptions = PdfToolsOcr_OcrOptions_GetImageOptions(pOptions);
GOTO_CLEANUP_IF_NULL_PRINT_ERROR(pImageOptions, _T("Failed to get image options. %s (ErrorCode: 0x%08x).\n"),
szErrorBuff, PdfTools_GetLastError());
GOTO_CLEANUP_IF_FALSE_PRINT_ERROR(
PdfToolsOcr_ImageOptions_SetMode(pImageOptions, ePdfToolsOcr_ImageProcessingMode_UpdateText),
_T("Failed to set image processing mode. %s (ErrorCode: 0x%08x).\n"), szErrorBuff, PdfTools_GetLastError());
GOTO_CLEANUP_IF_FALSE_PRINT_ERROR(PdfToolsOcr_ImageOptions_SetRemoveOnlyInvisibleOcrText(pImageOptions, TRUE),
_T("Failed to set RemoveOnlyInvisibleOcrText. %s (ErrorCode: 0x%08x).\n"),
szErrorBuff, PdfTools_GetLastError());
GOTO_CLEANUP_IF_FALSE_PRINT_ERROR(PdfToolsOcr_ImageOptions_SetDeskewScan(pImageOptions, TRUE),
_T("Failed to set DeskewScan. %s (ErrorCode: 0x%08x).\n"), szErrorBuff,
PdfTools_GetLastError());
GOTO_CLEANUP_IF_FALSE_PRINT_ERROR(PdfToolsOcr_ImageOptions_SetRotateScan(pImageOptions, TRUE),
_T("Failed to set RotateScan. %s (ErrorCode: 0x%08x).\n"), szErrorBuff,
PdfTools_GetLastError());
// Configure text OCR: update non-extractable text with correct Unicode
pTextOptions = PdfToolsOcr_OcrOptions_GetTextOptions(pOptions);
GOTO_CLEANUP_IF_NULL_PRINT_ERROR(pTextOptions, _T("Failed to get text options. %s (ErrorCode: 0x%08x).\n"),
szErrorBuff, PdfTools_GetLastError());
GOTO_CLEANUP_IF_FALSE_PRINT_ERROR(
PdfToolsOcr_TextOptions_SetMode(pTextOptions, ePdfToolsOcr_TextProcessingMode_Update),
_T("Failed to set text processing mode. %s (ErrorCode: 0x%08x).\n"), szErrorBuff, PdfTools_GetLastError());
GOTO_CLEANUP_IF_FALSE_PRINT_ERROR(
PdfToolsOcr_TextOptions_SetSkipMode(pTextOptions, ePdfToolsOcr_TextSkipMode_KnownSymbolic),
_T("Failed to set text skip mode. %s (ErrorCode: 0x%08x).\n"), szErrorBuff, PdfTools_GetLastError());
GOTO_CLEANUP_IF_FALSE_PRINT_ERROR(
PdfToolsOcr_TextOptions_SetUnicodeSource(pTextOptions, ePdfToolsOcr_UnicodeSource_InstalledFont),
_T("Failed to set unicode source. %s (ErrorCode: 0x%08x).\n"), szErrorBuff, PdfTools_GetLastError());
// Configure page OCR: process all pages and add tagging for accessibility
pPageOptions = PdfToolsOcr_OcrOptions_GetPageOptions(pOptions);
GOTO_CLEANUP_IF_NULL_PRINT_ERROR(pPageOptions, _T("Failed to get page options. %s (ErrorCode: 0x%08x).\n"),
szErrorBuff, PdfTools_GetLastError());
GOTO_CLEANUP_IF_FALSE_PRINT_ERROR(
PdfToolsOcr_PageOptions_SetMode(pPageOptions, ePdfToolsOcr_PageProcessingMode_All),
_T("Failed to set page processing mode. %s (ErrorCode: 0x%08x).\n"), szErrorBuff, PdfTools_GetLastError());
GOTO_CLEANUP_IF_FALSE_PRINT_ERROR(PdfToolsOcr_PageOptions_SetTagging(pPageOptions, ePdfToolsOcr_TaggingMode_Auto),
_T("Failed to set tagging mode. %s (ErrorCode: 0x%08x).\n"), szErrorBuff,
PdfTools_GetLastError());
// Create the OCR processor and add a warning handler
pProcessor = PdfToolsOcr_Processor_New();
GOTO_CLEANUP_IF_NULL_PRINT_ERROR(pProcessor, _T("Failed to create OCR processor. %s (ErrorCode: 0x%08x).\n"),
szErrorBuff, PdfTools_GetLastError());
GOTO_CLEANUP_IF_FALSE_PRINT_ERROR(PdfToolsOcr_Processor_AddWarningHandler(pProcessor, NULL, WarningHandler),
_T("Failed to add warning handler. %s (ErrorCode: 0x%08x).\n"), szErrorBuff,
PdfTools_GetLastError());
// Create stream for output file
pOutStream = _tfopen(szOutPath, _T("wb+"));
GOTO_CLEANUP_IF_NULL_PRINT_ERROR(pOutStream, _T("Failed to create output file \"%s\" for writing.\n"), szOutPath);
TPdfToolsSys_StreamDescriptor outDesc;
PdfToolsSysCreateFILEStreamDescriptor(&outDesc, pOutStream, 0);
// Process the document with OCR
pOutDoc = PdfToolsOcr_Processor_Process(pProcessor, pInDoc, pEngine, &outDesc, pOptions, NULL);
GOTO_CLEANUP_IF_NULL_PRINT_ERROR(pOutDoc, _T("The processing has failed. %s (ErrorCode: 0x%08x).\n"), szErrorBuff,
PdfTools_GetLastError());