3-Heights® Scan to PDF Server - convert scanned documents into PDF/A
Scanning paper documents has become a daily ritual in the mail receiving room of many businesses. This task is often performed by a third-party scanning service provider. In most cases the scanned images are saved as black & white TIFF files, the format synonymous with faxes. In special cases, for example checks, identification papers with photos etc., the documents are scanned to color files.
One must be cautious, however, since colored TIFF files can quickly become extremely large. The PDF/A standard has now also established itself in incoming mail applications, especially when dealing with color scans. However, individual processing steps like text recognition, compression and digital signatures are generally not optimized with one another or integrated into one single solution. There are, for example, scanners that can create PDF/A files and also sign them. However, the subsequent compression of the file invalidates the digital signature, making it worthless.
PDF Tools AG offers a solution for creating PDF/A files from scanned documents and fax images that fulfills all the vital requirements like small file size, searchable files and embedded metadata. The following diagram illustrates the principle.
Create files from scan and sign them
Make scanned documents searchable (OCR)
Central service for PDF/A document creation
Audit-compliant archiving of creditor invoices at KIBAG Dienstleistungen AG
Centralized document capturing with Scan2SAP solution
Scan to PDF Server – Features
- Conversion of single page or multi-page raster images to PDF
- Processing of subfolders
- Flexible workflow configuration
- Set output format and conformity level (PDF, PDF/A-1, PDF/A-2 and PDF/A-3)
- Optical character recognition (OCR) including barcodes
- Digital PDF signature
- Parallel processing
- Set image compression individually different classes of images
- Support for mixed raster content (MRC)
- CCITT Group3 (1D and 2D)
- CCITT Group4
- Deflate (ZIP)
- JBIG2 (lossless only)
- ISO 32000-1 (PDF 1.7)
- ISO 32000-2 (PDF 2.0)
- ISO 19005-1 (PDF/A-1)
- ISO 19005-2 (PDF/A-2)
- ISO 19005-3 (PDF/A-3)
Input Image Formats
- scanned PDF
- PDF 1.0 to 1.7
- PDF 2.0
- PDF/A-1, PDF/A-2, PDF/A-3
Areas of use - create PDF/A files from scanned documents
Electronic archiving of paper documents received as incoming mail within a company.
Electronic archiving of all fax transactions between the company and its business partners.
Migration of paper archives to an electronic archive with the standardized PDF/A format.
Use of the central service in client/server applications via a web service.
Enterprise application integration
Use of the central service for PDF/A document creation via a programming interface (API) from specialist applications that create TIFF or JPEG files.
Distributed architecture and scalability
The 3-Heights® Scan to PDF Server is a scalable and freely configurable service. The service accesses a separate program for each work stage, such as compression, OCR recognition, conversion into PDF/A, etc. It receives the result of the previous work stage as its input and makes the output available for the next work stage. The work stages are linked by means of an XML configuration file. This architecture allows the work stages of the service to be structured in a highly flexible way, and enables almost any number of extension possibilities by adding additional work stages.
To increase the level of parallel processing, the documents can be broken down into individual pages and sent through the processing stages simultaneously, after which they are then merged back into a single document. This option can improve the use of computer resources considerably (processor cores, memory, input and output, OCR engine, etc.).
- Standardized format
PDF/A is suitable for storing both scanned and digitally created documents.
- High compression rate
The PDF/A standard supports more modern and powerful compression processes, and thus small file sizes for color images.
- Text recognition
The created PDF/A documents can be made searchable by embedding text from an OCR engine.
- Embedded metadata
In order for the document and the associated metadata to form an inseparable whole, the metadata is embedded in the file in PDF/A. For saving, PDF/A uses the Extensible Metadata Platform (XMP) format, which, like PDF/A, is also defined as its own ISO standard.
- Digital signature
In order to ensure the integrity and authenticity of the created documents, a digital signature can be applied to the PDF/A document in accordance with the PAdES standard. The digital signature is a kind of electronic signature that can serve the same purpose as a handwritten signature, provided that the corresponding legal requirements (national signature laws) are met.
Advantages of PDF/A over TIFF
In principle, TIFF documents offer all these advantages, but only as proprietary extensions, since the TIFF standard itself does not offer solutions.
|Data consistency||Proprietary tags for metadata||+|
|Authenticity / Integrity||With detached signatures||+|
|Required storage space||
Black / White: +
|Searchability||Proprietary tags for OCR text||+|
Usually, the individual processing stages, such as text recognition, compression, PDF/A generation and digital signature, cannot be performed by the scanner alone, as metadata is often added retroactively by an index station. However, this work stage breaks the seal of the digital signature and makes it worthless. Here, too, separate software can offer a decisive advantage.