3-Heights™ Scan to PDF Server – convert scanned documents into PDF/A
Scanning paper documents has become a daily ritual in the mail receiving room of many businesses. This task is often performed by a third-party scanning service provider. In most cases the scanned images are saved as black & white TIFF files, the format synonymous with faxes. In special cases, for example checks, identification papers with photos etc., the documents are scanned to color files.
One must be cautious, however, since colored TIFF files can quickly become extremely large. The PDF/A standard has now also established itself in incoming mail applications, especially when dealing with color scans. However, individual processing steps like text recognition, compression and digital signatures are generally not optimized with one another or integrated into one single solution. There are, for example, scanners that can create PDF/A files and also sign them. However, the subsequent compression of the file invalidates the digital signature, making it worthless.
PDF Tools AG offers a solution for creating PDF/A files from scanned documents and fax images that fulfills all the vital requirements like small file size, searchable files and embedded metadata. The following diagram illustrates the principle.
- Standardized format: PDF/A is suitable for storing both scanned and digitally created documents.
- High compression rate: The PDF/A standard supports more modern and powerful compression processes, and thus small file sizes for color images.
- Text recognition: The created PDF/A documents can be made searchable by embedding text from an OCR engine.
- Embedded metadata: In order for the document and the associated metadata to form an inseparable whole, the metadata is embedded in the file in PDF/A. For saving, PDF/A uses the Extensible Metadata Platform (XMP) format, which, like PDF/A, is also defined as its own ISO standard.
- Digital signature: In order to ensure the integrity and authenticity of the created documents, a digital signature can be applied to the PDF/A document in accordance with the PAdES standard. The digital signature is a kind of electronic signature that can serve the same purpose as a handwritten signature, provided that the corresponding legal requirements (national signature laws) are met.
In principle, TIFF documents offer all these advantages, but only as proprietary extensions, since the TIFF standard itself does not offer solutions.
|Data consistency||Proprietary tags for metadata||+|
|Authenticity / Integrity||With detached signatures||+|
|Required storage space||Black / White: +|
|Searchability||Proprietary tags for OCR text||+|
Usually, the individual processing stages, such as text recognition, compression, PDF/A generation and digital signature, cannot be performed by the scanner alone, as metadata is often added retroactively by an index station. However, this work stage breaks the seal of the digital signature and makes it worthless. Here, too, separate software can offer a decisive advantage.
- Conversion of single page or multi-page raster images to PDF
- Processing of subfolders
- Flexible workflow configuration
- Set output format and conformity level (PDF, PDF/A-1, PDF/A-2 and PDF/A-3)
- Optical character recognition (OCR) including barcodes
- Digital PDF signature
- Parallel processing
- Set image compression individually different classes of images
- Support for mixed raster content (MRC)
- CCITT Group3 (1D and 2D)
- CCITT Group4
- Deflate (ZIP)
- JBIG2 (lossless only)
Areas of Use
- Paper capture: Electronic archiving of paper documents received as incoming mail within a company.
- Facsimile capture: Electronic archiving of all fax transactions between the company and its business partners.
- Archive migration: Migration of paper archives to an electronic archive with the standardized PDF/A format.
- Web/mobile capture: Use of the central service in client/server applications via a web service.
- Enterprise application integration: Use of the central service for PDF/A document creation via a programming interface (API) from specialist applications that create TIFF or JPEG files.
- ISO 19005-1 (PDF/A-1)
- ISO 19005-2 (PDF/A-2)
- ISO 32000-1 (PDF 1.7)
Distributed architecture and scalability
The 3‑Heights™ Scan to PDF Server is a scalable and freely configurable service. The service accesses a separate program for each work stage, such as compression, OCR recognition, conversion into PDF/A, etc. It receives the result of the previous work stage as its input and makes the output available for the next work stage. The work stages are linked by means of an XML configuration file. This architecture allows the work stages of the service to be structured in a highly flexible way, and enables almost any number of extension possibilities by adding additional work stages. To increase the level of parallel processing, the documents can be broken down into individual pages and sent through the processing stages simultaneously, after which they are then merged back into a single document. This option can improve the use of computer resources considerably (processor cores, memory, input and output, OCR engine, etc.).
- Windows Vista, 7, 8, 8.1, 10 - 32 & 64 bit
- Windows Server 2008, 2008 R2, 2012, 2012 R2, 2016 – 32 & 64 bit
- API: C, Java, .NET, COM
- Command line for batch processing