3-Heights™ Scan to PDF Server – convert scanned documents into PDF/A
Scanning paper documents has become a daily ritual in the mail receiving room of many businesses. This task is often performed by a third-party scanning service provider. In most cases the scanned images are saved as black & white TIFF files, the format synonymous with faxes. In special cases, for example checks, identification papers with photos etc., the documents are scanned to color files.
One must be cautious, however, since colored TIFF files can quickly become extremely large. The PDF/A standard has now also established itself in incoming mail applications, especially when dealing with color scans. However, individual processing steps like text recognition, compression and digital signatures are generally not optimized with one another or integrated into one single solution. There are, for example, scanners that can create PDF/A files and also sign them. However, the subsequent compression of the file invalidates the digital signature, making it worthless.
PDF Tools AG offers a solution for creating PDF/A files from scanned documents and fax images that fulfills all the vital requirements like small file size, searchable files and embedded metadata. The following diagram illustrates the principle.
- Standardized format: PDF/A is suitable for storing both scanned and digitally created documents.
- High compression rate: The PDF/A standard supports more modern and powerful compression processes, and thus small file sizes for color images.
- Text recognition: The created PDF/A documents can be made searchable by embedding text from an OCR engine.
- Embedded metadata: In order for the document and the associated metadata to form an inseparable whole, the metadata is embedded in the file in PDF/A. For saving, PDF/A uses the Extensible Metadata Platform (XMP) format, which, like PDF/A, is also defined as its own ISO standard.
- Digital signature: In order to ensure the integrity and authenticity of the created documents, a digital signature can be applied to the PDF/A document in accordance with the PAdES standard. The digital signature is a kind of electronic signature that can serve the same purpose as a handwritten signature, provided that the corresponding legal requirements (national signature laws) are met.
In principle, TIFF documents offer all these advantages, but only as proprietary extensions, since the TIFF standard itself does not offer solutions.
|Data consistency||Proprietary tags for metadata||+|
|Authenticity / Integrity||With detached signatures||+|
|Required storage space||Black / White: +|
|Searchability||Proprietary tags for OCR text||+|
Usually, the individual processing stages, such as text recognition, compression, PDF/A generation and digital signature, cannot be performed by the scanner alone, as metadata is often added retroactively by an index station. However, this work stage breaks the seal of the digital signature and makes it worthless. Here, too, separate software can offer a decisive advantage.
- Conversion of single page or multi-page raster images to PDF
- Set output format and conformity level (PDF, PDF/A-1, PDF/A-2 and PDF/A-3)
- OCR (optional)
- Digital PDF signature
- Parallel processing
- Set image compression individually different classes of images
- Support for mixed raster content (MRC)
- CCITT Group3 (1D and 2D)
- CCITT Group4
- Deflate (ZIP)
- JBIG2 (lossless only)
- Embedding of XML files: If TIFF files are created from specialist applications, it is often desirable to embed XML invoice data; for example, in accordance with the ZUGFeRD standard. The possibilities of PDF/A‑3 can be used for this purpose.
- PDF/A validation: For quality assurance purposes, validation software can be used to check that the created PDF/A conforms to the ISO standard.
- Document merging: Single-page images need to be merged into multi-page files. Or documents belonging to the same business case need to be merged into a single file or, for example, a collection of files that corresponds to a folder. The service can read text files that control the merge for this function.
- Stamping: A stamp or watermark can be added to the created documents. The service processes an XML file containing the stamp data.
Additional functions can be integrated into the service by means of following extensions.
Extensibility with additional functions
- Automatic classification: The automatic classification of documents based on their content – after scanning suppliers, customer addresses and invoice numbers, for example – can speed up large volume document processing considerably. This process makes the index stations redundant for a large part of the scanned documents.
- Splitting and merging of page content: The content of a page can have several logical sections that may be divided by barcodes, for example. A desirable function may be to isolate these sections and distribute them as separate pages.
- Conversion of color into gray-scale: If color is not required for a specific use, this will free up additional storage space.
- Importing other file formats: Some scanners provide PDF files that can be imported and optimized directly by the 3‑Heights™ Scan to PDF Server.
- Automatic control of work stages: Based on the content and formats, the 3‑Heights™ Scan to PDF Server can control the type and sequence of the work stages.
Areas of Use
- Paper capture: Electronic archiving of paper documents received as incoming mail within a company.
- Facsimile capture: Electronic archiving of all fax transactions between the company and its business partners.
- Archive migration: Migration of paper archives to an electronic archive with the standardized PDF/A format.
- Web/mobile capture: Use of the central service in client/server applications via a web service.
- Enterprise application integration: Use of the central service for PDF/A document creation via a programming interface (API) from specialist applications that create TIFF or JPEG files.
- ISO 19005-1 (PDF/A-1)
- ISO 19005-2 (PDF/A-2)
- ISO 32000-1 (PDF 1.7)
Distributed architecture and scalability
The 3‑Heights™ Scan to PDF Server is a scalable and freely configurable service. The service accesses a separate program for each work stage, such as compression, OCR recognition, conversion into PDF/A, etc. It receives the result of the previous work stage as its input and makes the output available for the next work stage. The work stages are linked by means of an XML configuration file. This architecture allows the work stages of the service to be structured in a highly flexible way, and enables almost any number of extension possibilities by adding additional work stages. To increase the level of parallel processing, the documents can be broken down into individual pages and sent through the processing stages simultaneously, after which they are then merged back into a single document. This option can improve the use of computer resources considerably (processor cores, memory, input and output, OCR engine, etc.).
- Windows Vista, 7, 8, 8.1, 10 - 32 & 64 bit
- Windows Server 2008, 2008 R2, 2012, 2012 R2, 2016 – 32 & 64 bit
- API: C, Java, .NET, COM
- Command line for batch processing