3-Heights™ PDF OCR - optical character recognition
The 3-Heights™ PDF OCR enhances PDF documents using information detected by an OCR engine.
Embed all found text elements in documents including in images & graphics
Tagging for accessibility
Prepare documents for PDF/A level A conversion
Detect barcodes & QR codes
Extract embedded codes and embed them into metadata
PDF OCR - features
- Make text extractable
- Text contained in images
- Text with fonts that have no Unicode information
- Text written using vector graphics (e.g. in CAD drawings)
- Any visible text, regardless of the type of graphics objects used
- Scan improvements
- Deskew scanned images
- Rotate pages according to the recognized rotation of scan
- Detect barcodes and QR codes
- Process embedded files
- Tagging of OCR text for accessibility
- High performance
- Asynchronous processing
- Page analysis and result caching to minimize OCR operations
- High quality
- PDF/A compliant
- High-fidelity conversion of existing page content
- 3-Heights™ PDF Rendering Engine 2.0.
- Automatic detection of optimal OCR resolution
Area of use - detect and recognize text in documents
Text recognition in the documentation process
PDF OCR supports document processes from receipt through to storage in a digital archive. Scanned images and embedded images in digitally produced documents are made readable, and missing Unicode characters in embedded fonts are added so that this text is also readable. All recognized texts are embedded in the document, making it searchable. These texts can also be extracted at any time using additional tools.
PDF OCR optimizes orders for the OCR engine to minimize the volume of recognized pages.
The tool simplifies the work stages in document processing, such as classification, categorization, indexing and enriching documents with metadata.
What do you get with 3-Heights™ PDF OCR
- Optimized page volume for the OCR engine
- Designed for individual processing and mass processing
- Powerful component with high stability, quality and scalability
Clean, lean and compliant PDF documents without loss of quality and information.
Efficient and cost-saving OCR processing at a high level
Efficient processing through the 3-Heights™ architecture. Fast document display, short download times and searchability.
Why is the extraction of text from a PDF document such a hassle
When I use a text editing tool such as Microsoft Word then it is quite natural that I can select a portion of text and copy it to the clipboard and paste it in to a window of any other tool. Not so with PDF. At least not with any kind of document. Why is that?