Make Documents Searchable with OCR

Transform scanned and digital documents with Optical Character Recognition and get more value out of PDFs that can be searched and edited

Integrate OCR into your document pipeline

PDF SDK

Programmatic OCR processing

Process PDFs programmatically in .NET, Java, Python, or C

  • Use Pdftools SDK to call the built-in OCR module

  • Recognise text in scanned images

  • Fix non-extractable text in born-digital PDFs

Get started with the Pdftools SDK

Programmatic OCR processing
Conversion Service

Configure OCR for 50+ file types

Implement OCR as part of your document automation pipeline

  • Designed for automated, high-volume workflows

  • Configured as a processing step within a workflow

  • Option to output XML file for structured data

Get the OCR Service add-on for the Conversion Service

Configure OCR for 50+ file types
BAYER logo

PostFinance logo

SwissLife logo

SUVA logo

UBS logo

OCR Service features

Detect text

Detect text

Detects text in scanned images and PDFs, making them searchable and editable

Detect tables

Detect tables

Detects tables, barcodes, engineering drawings, and other complex layout elements

Add text layer

Add text layer

Embeds invisible text layer in Unicode format without altering appearance

Automatic correction

Automatic correction

Automatic skew correction, rotation, and resolution handling

No unnecessary processes

No unnecessary processes

Detects which elements require OCR and only processes those

180+ languages

180+ languages

Supports over 180 natural and technical languages

Learn more about the OCR Service and its features

OCR in document workflows

The SDK takes PDFs as input and outputs PDFs with an invisible text layer. The Conversion Service takes one of the 50+ file formats the Conversion Service supports; the output can be a PDF or XML file.

Recognise text

Recognize text

Recognize text in scanned images and run OCR on it

Fix non-extractable text

Fix non-extractable text

Fix non-extractable text in born-digital PDFs by adding Unicode mappings

Process entire pages

Process entire pages

Process entire pages and add the results as OCR text

Add tagging

Add tagging

Add PDF tagging for accessibility compliance

Extract XML for OCR quality checks and audits

With the Conversion Service, you can check the accuracy of OCR results by extracting an XML file that gives insight into any OCR process that has previously been applied. The workflow extracts OCR-related information from PDF documents, outputs a structured XML file with detailed data, and supplies a confidence score for the OCR process.

This opens up quality control workflows: a low confidence score on a key field in a scanned document is a signal to route that document for human review rather than processing it further. It also has audit and legal value, as the XML creates a structured, timestamped record of the OCR interpretation, not just the final output.

Extract XML for OCR quality checks and audits

Get more out of your documents with the OCR Service

Get in touch with our team for an OCR Service license

What customers are saying

Government

BRZ

PDF/A and searchability at the Federal Ministry of Justice with the Pdftools SDK

Banking

UBS

The world's first electronic website archiving system in compliance with the ISO PDF/A standard from the Conversion Service

Insurance

SwissLife

Swiss Life archives documents from Microsoft SharePoint in PDF/A format with the Conversion Service

Healthcare

Storz Medical

The PDF Web Viewer brings new impetus to shock wave technology