3-Heights™ PDF OCR - Optical Character Recognition

The 3-Heights™ PDF OCR enhances PDF documents using information detected by an OCR engine.

Recognize Text

Embed all found text elements in documents including in images & graphics

Tagging for Accessibility

Prepare documents for PDF/A level A conversion

Detect Barcodes & QR Codes

Extract embedded codes and embed them into metadata

PDF OCR - enhances PDF documents using information detected by an OCR engine

PDF OCR - features

  • Make text extractable
    • Text contained in images
    • Text with fonts that have no Unicode information
    • Text written using vector graphics (e.g. in CAD drawings)
    • Any visible text, regardless of the type of graphics objects used
  • Scan improvements
    • Deskew scanned images
    • Rotate pages according to the recognized rotation of scan
  • Detect barcodes and QR codes
  • Process embedded files
  • Tagging of OCR text for accessibility
  • High performance
    • Asynchronous processing
    • Page analysis and result caching to minimize OCR operations
  • High quality
    • PDF/A compliant
    • High-fidelity conversion of existing page content
    • 3-Heights™ PDF Rendering Engine 2.0.
    • Automatic detection of optimal OCR resolution


  • ISO 32000-1 (PDF 1.7)
  • ISO 32000-2 (PDF 2.0)
  • ISO 19005-1 (PDF/A-1)
  • ISO 19005-2 (PDF/A-2)
  • ISO 19005-3 (PDF/A-3)
Powered by 3-Heights™ Technology and PDF/A compliant

Supported formats

Input and output formats

  • PDF 1.0 to PDF 1.7
  • PDF 2.0
  • PDF/A-1, PDF/A-2, PDF/A-3
Magnifying lens for our PDF manuals and PDF sample code



Area of use - detect and recognize text in documents

Text recognition in the documentation process

PDF OCR supports document processes from receipt through to storage in a digital archive. Scanned images and embedded images in digitally produced documents are made readable, and missing Unicode characters in embedded fonts are added so that this text is also readable. All recognized texts are embedded in the document, making it searchable. These texts can also be extracted at any time using additional tools.

PDF OCR optimizes orders for the OCR engine to minimize the volume of recognized pages.

The tool simplifies the work stages in document processing, such as classification, categorization, indexing and enriching documents with metadata.

Make all text in a document extractable

Recognize text in a PDF document using OCR and embed it into the document. Set the OCR engine and its parameters.

C# sample:
// Open input document
using (Stream inStream = File.OpenRead(inPath))
using (Document inDoc = Document.Open(inStream, null))

// Open output document
using (Stream outStream = File.Create(outPath))
    // Create OCR engine
    using (Engine engine = Engine.Create(engineName))
        // Set process parameters

        OcrParams ocr = new OcrParams();
        ocr.Engine = engine;

        ImageOcrParams imageOcr = new ImageOcrParams();
        imageOcr.Mode = ImageOcrMode.UpdateText;

        TextOcrParams textOcr = new TextOcrParams();
        textOcr.Mode = TextOcrMode.Update;

        // Process document
        WarningList warnings = inDoc.Process(outStream, null, ocr, imageOcr, textOcr, null, null);
Java sample:
try (// Open input document
    FileStream inStream = new FileStream(inPath, "r");
    Document inDoc = Document.open(inStream, null)) {
    try (// Create output document
        FileStream outStream = new FileStream(outPath, "rw")) {

        // Create OCR engine
        try (Engine engine = Engine.create(engineName)) {
            // Set process parameters

            OcrParams ocr = new OcrParams();

            ImageOcrParams imageOcr = new ImageOcrParams();

            TextOcrParams textOcr = new TextOcrParams();

            // Process document
            inDoc.process(outStream, null, ocr, imageOcr, textOcr, null, null);

What do you get with 3-Heights™ PDF OCR

    • Optimized page volume for the OCR engine
    • Designed for individual processing and mass processing
    • Powerful component with high stability, quality and scalability

Quality assurance
Clean, lean and compliant PDF documents without loss of quality and information.

Cost saving
Efficient and cost-saving OCR processing at a high level

Efficient processing through the 3-Heights™ architecture. Fast document display, short download times and searchability.

Why is the extraction of text from a PDF document such a hassle

When I use a text editing tool such as Microsoft Word then it is quite natural that I can select a portion of text and copy it to the clipboard and paste it in to a window of any other tool. Not so with PDF. At least not with any kind of document. Why is that?