3-Heights® PDF OCR - Optical Character Recognition

The 3-Heights® PDF OCR enhances PDF documents using information detected by an OCR engine.

Recognize Text

Embed all found text elements in documents including in images & graphics

Tagging for Accessibility

Prepare documents for PDF/A level A conversion

Detect Barcodes & QR Codes

Extract embedded codes and embed them into metadata

logo

3-Heights™ tools to digitize extensive paper archive

Due to a move of location, the company decided to convert all of their paper dossiers into digital format. The requirement was that the software had to carry out its task in a stable manner with reliable quality, and in case of error no untraceable circumstances could arise.
logo

PDF/A conversion with OCR recognition for Volkswagen Foundation’s document management

The Volkswagen Foundation had numerous different types of PDF and office documents, images and emails stored in their previous document management system (DMS). For the future, all image and PDF documents should be converted into the standardized long-term archiving format PDF/A.
3-Heights® PDF OCR - Optical Character Recognition Product Illustration

PDF OCR - Features

  • Make text extractable
    • Text contained in images
    • Text with fonts that have no Unicode information
    • Text written using vector graphics (e.g. in CAD drawings)
    • Any visible text, regardless of the type of graphics objects used
  • Scan improvements
    • Deskew scanned images
    • Rotate pages according to the recognized rotation of scan
  • Detect barcodes and QR codes
  • Process embedded files
  • Tagging of OCR text for accessibility
  • High performance
    • Asynchronous processing
    • Page analysis and result caching to minimize OCR operations
  • High quality
    • PDF/A compliant
    • High-fidelity conversion of existing page content
    • 3-Heights™ PDF Rendering Engine 2.0.
    • Automatic detection of optimal OCR resolution

Conformance

  • ISO 32000-1 (PDF 1.7)
  • ISO 32000-2 (PDF 2.0)
  • ISO 19005-1 (PDF/A-1)
  • ISO 19005-2 (PDF/A-2)
  • ISO 19005-3 (PDF/A-3)
Powered by 3‑Heights® TechnologyPDF/A compliant

Supported formats

Input and output formats

  • PDF 1.0 to PDF 1.7
  • PDF 2.0
  • PDF/A-1, PDF/A-2, PDF/A-3

Areas of use - detect and recognize text in documents

Text recognition in the documentation process

PDF OCR supports document processes from receipt through to storage in a digital archive. Scanned images and embedded images in digitally produced documents are made readable, and missing Unicode characters in embedded fonts are added so that this text is also readable. All recognized texts are embedded in the document, making it searchable. These texts can also be extracted at any time using additional tools.

PDF OCR optimizes orders for the OCR engine to minimize the volume of recognized pages.

The tool simplifies the work stages in document processing, such as classification, categorization, indexing and enriching documents with metadata.

Contact us

Make all text in a document extractable

Recognize text in a PDF document using OCR and embed it into the document. Set the OCR engine and its parameters.

C# sample:
// Open input document
using (Stream inStream = File.OpenRead(inPath))
using (Document inDoc = Document.Open(inStream, null))

// Open output document
using (Stream outStream = File.Create(outPath))
{
    // Create OCR engine
    using (Engine engine = Engine.Create(engineName))
    {
        // Set process parameters
        engine.SetParameters(engineParams);

        OcrParams ocr = new OcrParams();
        ocr.Engine = engine;

        ImageOcrParams imageOcr = new ImageOcrParams();
        imageOcr.Mode = ImageOcrMode.UpdateText;

        TextOcrParams textOcr = new TextOcrParams();
        textOcr.Mode = TextOcrMode.Update;

        // Process document
        WarningList warnings = inDoc.Process(outStream, null, ocr, imageOcr, textOcr, null, null);
    }
}
C# sample:

What do you get with 3-Heights® PDF OCR

  • Optimized page volume for the OCR engine
  • Designed for individual processing and mass processing
  • Powerful component with high stability, quality and scalability

Quality assurance

Clean, lean and compliant PDF documents without loss of quality and information.

Cost saving

Efficient and cost-saving OCR processing at a high level

Time

Efficient processing through the 3-Heights® architecture. Fast document display, short download times and searchability.