Long-Term Archiving for the Future

PDF/A, an ISO standard, guarantees that documents can still be read in 10, 50 or even 100 years from now. This format contributes significantly towards avoiding a "Digital Dark Age" and helps to maintain data from the present.

PDF/A – archiving

PDF/A will undoubtedly establish itself as the standard long-term archiving solution for electronic documents. It was published as an ISO standard on October 1, 2005 and has since set out to conquer the world. As Swiss representative on the ISO committee for PDF/A, PDF Tools is your competent point of contact for all issues dealing with PDF/A. We will be more than happy to assist you if you cannot find the answers to your specific questions on this website.

Information about PDF/A

PDFAInfoBox

Overview

Introduction to PDF/A

10 most important things about PDF/A

PDF/A from digital sources

Processes related to PDF/A

PDF/A Products

PDFAProdBox

Overview

Create and convert to PDF/A

Display and print PDF/A

Digitally sign PDF/A

OCR for PDF/A

Information about PDF/A

  • Overview
  • Introduction
  • 10 most important things
  • From digital sources
  • Processing
  • PDF/A-2
  • PDF/A-3

Overview

Everything you need to know about PDF/A is available here.

Introduction to PDF/A-1

Here you can find out why PDF/A was created, what exactly this standard contains, what types of sub-standards there are, how PDF/A is used and whether it solves all the problems related to long-term archiving.

Download white paper as PDF

The 10 most important things you need to know about PDF/A

These 10 points explain the most important details about the applicability, benefits and limitations of PDF/A.

Download flyer as PDF

PDF/A from digital sources

You can review here how PDF/A files are created and converted from digital sources, whereby both technical and business related aspects are taken into account.

Watch webinar (video, WMF, 50 MB)
To view the webinar, you will need Windows Media Player and GoToMeeting Codec.

PDF/A processing

Here you will have the chance to familiarize yourself with the common processes of creating, displaying and printing PDF/A documents.

PDF/A-1 vs PDF/A-2

In July 2011, the ISO released the PDF/A-2 standard. In anticipation of introducing PDF/A-2, businesses are asking us two key questions:
1. What is different from PDF/A-1 to PDF/A-2?
2. Do I have to re-convert my PDF/A-1 documents to PDF/A-2?
This article will answer the two questions and offer insight to the PDF/A-2 standard.

Download flyer as PDF

PDF/A-2 vs PDF/A-3

The ISO Committee released the third edition of the standard (ISO 19005-3) in October 2012.
Here you find answers to the questions:
1. What is the difference between PDF/A-2 and PDF/A-3?
2. Which PDF/A format is the right one?
And gain insight into the PDF/A-3 standard.

Download flyer as PDF

Introduction to PDF/A

Introduction

The PDF/A standard

Use of the PDF/A standard

Summary


Introduction

Background

On September 28, 2005 the International Standards Organization (ISO) approved a new standard governing archiving of electronic documents:

ISO-19005-1 - Document management - Electronic document file format for long-term preservation - Part 1: Use of PDF 1.4 (PDF/A-1).

The standard was the result of more than 36 months of collaboration among companies and organizations around the world.

In May 2002, the Association for Information and Image Management (AIIM), the National Printing Equipment Association (NPES) and the administrative body for the US Courts launched an initiative for the creation of standardized formats for electronically archived documents.  The kick-off meeting took place in October 2002. It was attended by PDF manufacturers such as Adobe Systems, Library of Congress, Surety Inc., Quality Associates Inc., Appligent, Merck, EMC, PDF Sages, and NARA (National Archives and Records Administration). Xerox, Honeywell, EDS, and Glaxo Smith Kline also joined in later, just to name a few.

The founders of the project put together a first version and submitted their recommendation to the ISO in order to have it registered as an international standard. The ISO assigned the project to the Technical Committee TC 171 (Document Management Applications). TC 171 consists of representatives of 13 member countries (one vote each) as well as observers from another 21 countries. After numerous reviews and improvements, the standard was accepted in September 2005.

Why the PDF/A initiative?

Archiving formats vary from country to country. Traditional archiving methods (paper, microfilm, microfiche), while guaranteeing reproducibility, no longer comply with the latest technology. Large documents cannot be quickly mailed around the globe, and it is extremely difficult to search the archived documents for specific content. Many organizations set up TIFF archives as a first step towards electronic archiving. TIFF also guarantees long-term reproducibility and is a well-established format. TIFF can now be transmitted quickly and easily in globally connected organizations; however, searching is still difficult.

PDF began to be considered at this point. A number of reasons make PDF more attractive than TIFF:

  • PDF saves structured objects (such as texts, vector graphics, raster images) that support efficient searching in the entire archive. TIFF, on the other hand, is a raster format and must be processed with an OCR machine to enable a full-text search.
  • PDF files are more compact and often require a fraction of the storage space of a corresponding TIFF file, often even with better quality. The small file size is especially beneficial for electronic data exchange (FTP, email attachments, etc.).
  • Metadata such as title, author, date of creation and modification, content, keywords, etc. can be embedded directly in the PDF document. Thus, they can be classified automatically without any human intervention.
  • The page contents in a PDF document are usually device-independent, i.e. independent from the raster resolution, color code, etc. The pages are not displayed on the raster until the reproduction (rendering process). PDF documents therefore benefit from the technological progress of output equipment, such as printer, monitor etc. even years later.

The creator of PDF de facto standards, Adobe Systems, has published eight new versions of its 'PDF Reference Manual' in the last thirteen years. Each new version expanded the format with numerous new features and modified some of the old features. It was therefore necessary to develop a stable, internationally accepted standard for long-term archiving, built on Adobe's proprietary PDF specifications. The outcome: PDF/A.

The PDF/A standard

Purpose of PDF/A

The ISO standard 19005 defines a file format based on PDF called PDF/A. The format offers a mechanism that represents electronic documents such that the visual appearance remains preserved for an extended period, independent of tools and systems for producing, saving and reproducing it. This standard specifies neither the methods, the intention nor the purpose of preservation. The standard is thus intended to guarantee that electronic documents can be viewed in their original appearance, even in the future. For this reason, the document may not refer, either indirectly or directly, to any external source. An example would be an external image or a font that is not embedded in the document itself.

Comparison between PDF and PDF/A

The normal PDF format does not guarantee long-term reproducibility or complete independence from the software and the output device. In order to guarantee both principles, it was necessary to both limit and expand the existing PDF specification. It was clear from the outset that PDF/A-1 had to be based on an existing version of PDF in order to achieve the acceptance of a wide audience. The ISO committee TC 171 chose the Adobe PDF Reference 1.4 as a basis for the PDF/A-1 standard.

The PDF Reference 1.4 was implemented by Adobe in their Acrobat 5 product. PDF/A-1, as a standard, must fulfill all requirements of this document, and must also respect certain technical limitations of Acrobat 5. The original PDF Reference and ISO 19005-1 together comprise the current PDF/A-1 Standard. ISO 19005-1 only identifies the differences with respect to the PDF Reference. Accordingly, PDF Reference 1.4 is the central basis on which to comprehend the PDF/A-1 standard.

Several PDF 1.4 features, such as transparency or the reproduction of audio and video, are prohibited in the PDF/A-1 standard. Certain options of PDF 1.4 are mandatory in PDF/A-1: for example, all fonts used must be embedded in the document. Essentially, the PDF/A-1 standard does nothing other than specifically identify individual characteristics of PDF Reference 1.4 and to indicate whether each is absolutely necessary, recommended, limited, or not permitted.

The PDF/A, A-1a, A-1b, A-2 "Babylon"

The PDF/A-1 Standard is divided into two levels of conformance: PDF/A-1a and PDF/A-1b.

PDF/A-1a (Level A Conformance) defines conformance with all requirements of the PDF/A-1 standard.

The minimum requirements for conformance with PDF/A-1 are contained in PDF/A-1b (Level B Conformance). The PDF/A-1b requirements are generally sufficient for unequivocal reproduction over an extended period.

PDF/A-1a differs from PDF/A-1b mainly with respect to accessibility requirements (Paragraph 508 of the US Rehabilitation Act).

  • PDF/A-1a guarantees that the document text is extractable and that the logical structure of the document as well as the natural reading process of integrated text material remain intact. Text extraction is mainly of interest if documents are to be displayed on mobile devices (e.g. PDA) or visualized in the sense of Paragraph 508 of the US Rehabilitation Act. This includes the requirement that the representation of the text fit on the reduced screen by being restructured (re-flow). This functionality is also known as tagged PDF.
  • PDF/A-1b ensures that text and other content on pages is reproduced uniformly; it is not a guarantee, however, that the embedded text is comprehensible and machine-legible. The creator of a PDF/A-1b compliant file is free to embed the text in a readable form, even if the more stringent requirements pursuant to the aforementioned Section 508 are not met.

For scanned documents, conformance with PDF/A-1b is completely sufficient, even if they have been processed using OCR to enable a full text search.

In July 2011 the Technical Committee released a new part of the standard: ISO 19005-2 (PDF/A-2). Where PDF/A-1 is based on PDF version 1.4, PDF/A-2 takes advantage of features that only became available in later versions of PDF, up to and including PDF version 1.7. But most importantly, PDF/A-2 is no longer based on a particular Adobe PDF version, but instead, is now based on the ISO standard 32000-1.

The ISO Committee released the third edition of the standard (ISO 19005-3) in October 2012. PDF/A-3 contains just one change that is necessary but controversial: PDF/A-2 already enabled the embedding of PDF/A-compliant documents as attachments. PDF/A-3, however, makes it possible to embed any document format such as Excel, Word, HTML, CAD or XML files for the first time ever.


Use of the PDF/A standard

How do I get a copy?

The PDF/A standard ISO 19005 can be purchased from the ISO website. Copies can be ordered on paper or electronically in PDF format and, like all other ISO standards, are protected by copyright. It is therefore illegal to offer free copies via the internet. The standard is currently available in English only.

Who should read the standard?

The purpose of the PDF/A standard is to support and improve archiving strategies. The standard itself is quite technical in nature and can only be understood by experts with extensive knowledge of page description languages, such as PostScript and PDF. The main document itself is small, however the scope of the basis document is very large. The PDF Reference 1.4 alone contains 1,000 pages, not including the referenced documents (font and compression formats, XML specifications, ICC color profiles, digital signatures, RFCs, etc.). Furthermore, the just the standard alone does not guarantee long-term preservation. It is recommendable to consult an expert to fully understand the PDF/A requirements, to implement a company-wide archiving policy based on it and to achieve the long-term objectives of document archiving.

What tools are available?

Tools for creating, processing and validating PDF documents have been available on the market since mid-2006. Adobe itself integrated corresponding features in Version 8 of Adobe Acrobat, released in Fall 2006. Microsoft also provides a separately downloadable plug-in for Office 2007 that enables the creation of PDF/A compliant files directly from Office products. Given the number of products for creating of PDF/A already on the market, it has now become very important to test each created PDF/A document with respect to proper PDF/A conformance.

PDF/A requires a comprehensive solution

The PDF/A standard is merely a component of a comprehensive solution. PDF/A alone does not guarantee long-term preservation, or that the display functions as intended. Neither does PDF/A claim to be the most appropriate solution in every scenario. On the other hand, PDF/A defines the specific requirements for electronic documents so that they can be preserved over the long-term. Other aspects must be considered if a PDF/A compliant archive is to be implemented. These include, among other things, in-house company standards and processes, quality management, reliable data sources and dedicated requirements tailored to the specific application purpose. In particular, the migration of existing paper or TIFF archives to a PDF/A compliant archive is not an insignificant task, and must therefore be planned carefully.


Summary

PDF/A as a new archiving standard

PDF/A is expected to become the new standard for archiving electronic documents. PDF is omnipresent in private and public sectors worldwide and is already accepted as a format for countless purposes. The PDF/A standard will help to ensure that users will be able to safely reproduce documents even after a long period of time.

The introduction of the PDF/A standard will (as it should) probably influence the future development of PDF itself. Independent of this, Adobe will continue with improvements and the introduction of new features. Examples include 3-D models or XFA for dynamic PDF forms. This will put further pressure on the standard, because the essence of a standard - especially an archiving standard - is that it is not frequently modified.

How will the market react?

We should not expect that PDF/A products will inundate the market. It takes considerable knowledge to understand the technology behind PDF/A. Moreover, the user has higher quality requirements with standard-compliant software. The first tools appeared on the market in mid-2006. In demand are PDF/A compliant production, PDF/A validation as well as a simple conversion of existing PDF documents into conforming PDF/A files.

The appearance of the first professional PDF/A tools has already triggered processes for the implementation of PDF/A compliant archiving systems. Too much functionality should not be expected at this point. It is likely that initially only limited PDF/A-1b will be offered, and the complete PDF/A-1a not until later. As is so often the case when introducing a new standard, many products will be released to the market that advertise PDF/A conformity yet do not actually fulfill the requirements of the standard. Expertise for evaluation and reputable providers are particularly in demand during the launch phase.

Hot air or long-term strategy?

PDF/A will not be short-lived. The need for a standardized framework for archiving with PDF has existed for several years. And: PDF is already being used for this purpose in many applications, with the help of company-specific policies. The fact that Microsoft is responding to customer demand by making it possible to create PDF/A documents directly from the most recent Office palette is a clear signal. Internationally accepted, PDF/A is here to stay. 

The 10 most important things you should know about PDF/A

1) PDF/A is an ISO standard

An ISO standard represents an international consensus for best practice in conforming to a specification or standard. PDF/A has been an ISO standard since October 2005. It was established to meet the specific requirements for long-term archiving. As an open standard, PDF/A is platform and supplier independent. PDF/A builds on established experience in the area of PDF (since 1993). The format is widely used and established in all industries. The PDF/A standard along with the availability of a large number of viewers (such as Adobe Acrobat/Reader) guarantee the readability of the PDF/A format into the future. The PDF/A standard provides all necessary information to build a PDF/A-compliant viewer, even in many years from now. PDF/A is continuously being enhanced. In 2011 the second part and in 2012 already the third part was published.

2) Standardized metadata is embedded directly into the document

All information used to describe a document (so called “metadata”) is directly embedded in the PDF/A document in a standardized, exchangeable format (XMP, open XML standard). Metadata of other formats such as PNG, PostScript or TIFF can easily be transferred into the PDF/A document. The use of (often proprietary) systems to store such metadata separate from the document is obsolete.

3) PDF/A documents are fully text searchable

The ability to perform full text searches on PDF/A documents is also part of the standard. The text of digitally generated content is preserved in the document. This is also true for scanned documents that have undergone optical character recognition (OCR). The PDF/A file saves both the recognized text (as Unicode) as well as the originally scanned image, which retains the visual appearance yet allows the search function.

4) Everything is included for authentic reproduction

PDF/A files are self-contained: all elements (fonts, color profiles etc.) necessary for a flawless, authentic reproduction are included in the PDF/A file. A PDF/A document may not contain any references to external content or resources. Simple information references like links to web pages are allowed however.

5) PDF/A is space saving

Although PDF/A documents contain more information than images (such as TIFF), the PDF/A files are usually due to the use of efficient compression algorithms.

6) Optimal security with digital signatures

The combination of PDF/A and digital signatures is the perfect way to verify that the PDF document has not been tampered with and that it is also authentic. This provides optimal legal security for long-term archiving.

7) PDF/A can meet accessibility requirements (section 508)

The PDF/A standard contains sub-standards. The ‘Part A’ sub-standards (PDF/A-1a, PDF/A-2a and PDF/A-3a) are the most detailed variants that requires specific information to better assist persons with certain disabilities: for example, text must be stored as Unicode so text-to-speech engines can be supported, and document structure, the reading order as well as descriptions of pictures must be included. Other sub-standards are a less demanding variants, which are sufficient in most cases. Typical application areas are scanned documents and conversions of digital born documents into PDF/A.

8) PDF/A documents remain valid without time limits

The standards committee extends the existing standard every two to four years with meaningful amendments. That does not mean that existing PDF/A documents must be migrated to such new standard amendments. Existing PDF/A documents remain conform perpetually. Contrary to other standards, the ISO cannot withdraw the PDF/A standard.

9) PDF/A being widely accepted

In Europe and Asia, PDF/A is already recommended or required and legislated for long-term archiving in several governments, organizations and corporations. Also in North America such recommendations exist in the areas of jurisdiction and libraries and the demand for this standard is growing continuously. The PDF Association (see next page) is instrumental in supporting the PDF/A standard.

10) Reliable tools are available today

Thanks to the long experience with PDF and the fast response of the major PDF technology suppliers, there are already many PDF/A compliant software tools on the market. The PDF Association has developed a test suite that allows PDF/A validation software products to be validated reliably. A validator helps to check whether a document conforms to the PDF/A standard or not.

PDF/A for digitally-born documents – archiving MS-Office documents, emails and websites

1) Introduction

2) Development of digital documents as archive materials

3) Attributes of analog and digital sources

4) Converting digital sources to PDF/A

5) Summary

1) Introduction

When compared with the preservation of data in its original format, there are many advantages to archiving documents and data from digital sources as PDF/A. The source applications are rapidly being developed further. As a result of this, after only a few years, the readability and the authentic display of data can no longer be guaranteed. Furthermore, a company must maintain all of the applications that are used and all of the platforms on which they operate. This incurs considerable costs. Even for documents and files that are created digitally, PDF/A is an excellent choice for long-term archiving and comes with great advantages with regard to uniformity, search ability and cost-effectiveness.
 

2) Development of digital documents as archive materials

The ECM model from AIIM distinguishes between five major processes in the management of business information: Capture, manage, deliver, preserve and store the documents. These processes can be easily assigned to the following PDF/A functions:

 
The ECM model from AIIM and the associated PDF/A functions


Digital documents are created in all of the mentioned processes and PDF/A is also important in all of these processes, although in different ways, as explained in the following.

What are the typical sources of digital documents that are later archived, and in which processes do these emerge?

  • Inbox
    • Scans with or without OCR (optical character recognition)
    • Emails with or without attachments
  • Office, graphics and construction
    • MS Word, Excel, Powerpoint, Visio, etc.
    • Illustrator, Indesign, Photoshop, etc.
    • CAD: Autocad, 3D Studio Max, etc.
  • Electronic data interchange
    • SWIFT, EDIFACT, etc.
  • Outbox
    • Print data streams: PostScript, PCL, AFP, etc.
  • Archive migrations
    • Masses of TIFF and other files, including source data (metadata, object relationships, etc.)


3) Attributes of analog and digital sources

Digital documents can emerge from analog and digital sources. Some parameters are relevant for their subsequent long-term archiving:

Attribute Analog Digital
Sources Scanner, raster images Standard and proprietary formats from applications and data streams, in file storage, mailboxes and attachments
Quality of the source Good Large differences
Complexity of the source Low Can be very high
Product differentiation Compression rate, performance Quality
Biggest challenge OCR recognition rate Loss of information during the conversion


From these differences, it is clear that we require different strategies for handling different sources, both in the general outline and in detail. These strategies are required both for the employees of IT departments, the records manager and for manufacturers of conversion products. The challenge here lies not only in creating a document that conforms to the PDF/A standard, but in interpreting the source in such a way that the visual appearance corresponds to the original document. The following diagram shows the results of conversions to PDF/A whose form conforms to the standard, but whose visual appearance does not sufficiently correspond to that of the source:




 


Correct and incorrect conversions: In both cases, the result was a document that conforms to PDF/A, but, in the case of an incorrect conversion, does not correspond to the original document in any way.

4) Converting digital sources to PDF/A

4.1) Why convert?

Long-term archiving of digital data to PDF/A offers great advantages:

  • The user does not have to maintain the original “native” applications and the platforms on which the applications operate.
  • Users depend less on software manufacturers because all of the relevant information is saved in one ISO-standardized format, and this format is manufacturer-independent.
  • Simplified processing due to the fact that the archived data is standardized into one format.
  • Option to perform a full-text search in all of the stored data.

These advantages also involve an economic benefit that must not be underestimated.

Of course, when compared to the native formats, archiving in PDF/A also has a few disadvantages, for example the loss of interactivity or the built-in “functionality” of the native format. MS Excel can be used as an example here. MS Excel offers calculation formulas for content, which are lost during the conversion. Therefore, for these formats, it always makes sense to also archive the original document and to use the archiving in PDF/A as a fallback variant. With “interactive” files, the time for archiving can be chosen so that there is hardly any need for further changes (Document Lifecycle Management). In certain formats, for example emails, the original document may have to be saved due to compliance reasons.

4.2) Overview development and conversion processes

The easiest way to create PDF/A from proprietary formats such as Office documents, CAD drawings, etc. is to use an effective printer driver, also known as PDF Producer, PDF Creator or PDF Converter (for example, Adobe Distiller etc.). This “detour” via a printer driver is required because, so far, most native applications do not have a “Save to PDF” function. This function is now available for MS Office 2007 but it must be downloaded as a separate add-in.
The process of archiving emails, including attachments, to PDF/A (for example, from MS Outlook) is more complex. There are currently only a few providers with this type of functionality. PDF Tools AG has developed the 3-Heights™ Document Converter, which converts an email and its attachments into a single PDF/A document.

From databases, ERP systems, etc., PDF/A is usually controlled using an export function (“Save to PDF”). Often, these files must be post-processed because they do not completely conform to the standard. Another option here is the direct, programmatic creation of PDF and PDF/A files. In this process, the contents from any source can be merged, for example, for processing personalized printed materials. PDFLib GmbH is one of the leading providers of these tools. 

Specific tools are usually used to convert images and, in this process, an OCR function is important for the creation of metadata and for the searchability of the texts. In spite of this, even in scanned documents, we cannot underestimate the complexity of such applications, particularly in the areas of multiple formats (for example, dozens of variants of TIFF), colors, fonts and compression and segmenting procedures, such as Mixed Raster Content (MRC).

 

Converting digital sources to PDF/A using various conversion procedures


All conversion software in all of the areas must take into account the specific obligations and prohibitions from PDF/A, for example, the embedding of fonts, color profiles and metadata (as XMP).

4.3) General challenges

From a general prospective, when creating PDF/A from digital sources, we are confronted with the following challenges:

  • Colors:
    • If the color profiles from the sources are missing, assumptions have to be made about the color space
  • Fonts:
    • If fonts (or glyphs) are missing, replacement fonts must be selected. To do this, the text must be a Unicode text
  • Transparency:
    • The flattening of transparency is complex and may lead to the loss of information (fonts, vectors, etc.)
  • Levels, interactive and multimedia elements:
    • Only the “Print Preview” is retained
  • Actions:
    • Functionality (JavaScripts etc.) is lost
  • Digital Signatures:
    • Must be checked, documented and signed again

4.4) Converting emails

An email can contain all types of documents, interlaced archives and much more (executable files etc.). In addition, the email can contain internal or external references (e.g., HTML mails) and different systems, interfaces, file systems and data streams are involved. The process of archiving emails, including attachments, is therefore effectively the “supreme discipline” of archiving in PDF/A, since all of the challenges in connection with converting sources that were originally analogue or digital must be solved using one single product.

To solve this, a different conversion strategy must be selected for each individual element of an e-mail: The email body and attachments are converted individually and, only then, are merged into a single document. In this PDF/A document, each attachment can then be identified using a so-called bookmark entry. By doing this, the structure of the emails can also still be traced at a later point. In addition, information such as tables of contents from Word documents is not lost, because these are mapped as a second level of hierarchy in the bookmarks and are linked accordingly in the PDF/A.

Even the handling of digital signatures poses a challenge when archiving emails

4.5) Converting websites

The topic of archiving websites is relatively new. This basically involves retaining the contents and state of one’s own website in a way that is legally trustworthy so that the required evidence can be provided in legal or other procedures.

The difficulty when archiving websites is that the output using a print driver does not normally represent the authentic appearance of the website, because websites are usually specially prepared for printing. To be able to bring forward trustworthy evidence, this “true to the original” is crucially important.

Therefore, from the website, a “Capture” function is used to create an image that is merged with the relevant text and other information (fonts, color spaces, etc.) to effectively produce a “vectorized, searchable screenshot”. Another complex issue is the handling of external links and the internal link structure of a website. In addition, it is necessary to decide on one browser and one browser version because different browsers and browser versions display websites differently.

4.6) Converting on the client or on the server

We must consider the following aspects with regard to the question of whether conversion software should be installed on individual clients or on a central server:

Attribute Client Server
Scaling workstations Small amount Large amount
Distribution Complex Simple
Robustness for the users Depends on the creator-applications Independent
Performance for the users Restricted by the client Scalable
Supported source formats Restricted by the installation Scalable
Application support Local Central
 

4.7) Font handling in mass archiving

Single, individual PDF/A documents can be directly archived. When archiving large quantities of similar PDF/A documents (for example, telecom invoices etc.), the situation often arises in which the documents contain the same fonts, logos or other corporate identity elements that must also be archived for each individual document. The repeated saving of collective resources (fonts, images) is undesirable and reduces the acceptance of PDF/A. 

To solve this, the archive system can be upgraded using an add-in that separates the shared resources and saves them in only one instance for all documents when performing mass archiving of PDF/A documents. When a document is accessed, the shared resources are again merged with the document to produce a complete PDF/A document. This procedure can also be used for digitally signed documents, but, during the signing process, the document must already be prepared for the separation of resources.
 

 

Concept for preventing superfluous saving of resources (e.g., fonts) in mass archiving.
 

4.8) Legal security with digital signature

The process of digitally signing PDF/A files derived from digitally created documents brings greater legal security. Depending on the application, the user must be clear about what the signature really provides. In any case, with a qualified electronic signature, it is absolutely clear at what time the conversion and application of the digital signature occurred and whether the document has been changed since the conversion. It is also clear who performed the conversion process in a company.

However, the uncertainty that arises from the “dynamic” source (e.g., a database) of such a PDF/A document cannot be dispelled. Nor is it possible to verify whether the created PDF/A document actually corresponds to the appearance of the original document (e.g., a Word document) or whether all of the information that is contained in the document (e.g., contents and email attachments) actually exists in the PDF/A file. To increase the credibility of such documents, the entire process must be certified. This is therefore a topic that transcends the simple use of digital signatures. However, such certifications require a certain volume of data so that this is worthwhile for service providers, manufacturers of software and systems and large companies.

4.9) Quality assurance by validators

“Trust is good, control is better”: This, of course, also applies for PDF/A documents and products that create PDF/A. Or that claim to create PDF/A. Not all the products that are labeled as PDF/A are actually PDF/A products. In extreme cases, the archiving of company data can be crucial for the existence of a company.

This can occur in a lawsuit, for example, if the exonerative records have not been prepared or have not been prepared correctly. It is therefore important to use tools that ensure the highest standards of quality. Validators exist to determine if a tool fulfils this prerequisite. These validators also need to be checked. For this task, the PDF Association created a freely available test suite that systematically breaches the standard and then checks that a validator can identify all of the breaches.

The use of a validator is not only important when evaluating a tool, but it is also important in the operational processes. A validator should therefore be used regularly to check the conformity of the created PDF/A documents - as a permanent quality check. This is because different sources, application versions, etc. may lead to different conversion results.

5) Summary

PDF/A is beneficial as a format for archiving digital documents and can lead to considerable cost savings in comparison to archiving in the native format. However, the devil is in the details with this and the complexity that arises depending on the source of the digital documents must not be underestimated. It is therefore essential to collaborate with specialists in this area. This collaboration can protect users from unnecessary costs accrued through incorrect processes etc. For both day-to-day business and from a strategic point of view (e.g., in legal cases), it is very important that information can be accessed quickly and securely. Discrepancies in this area can result in damage to a company's image or in substantial financial consequences. Processes for archiving directly from digital data are therefore given top priority.

PDF/A processing

Detailed knowledge of PDF/A standards is necessary in order to create and accurately display PDF/A documents. Nevertheless, this knowledge alone is not sufficient in the attempt to optimally configure PDF/A-related processes.

In this article we will point out how to optimally configure some of the more typical processing steps. Additionally, we will point out which of our products can be used for this purpose.

Topics overview:

Creating PDF/A documents

Processing and converting PDF documents

Signing PDF/A documents

Validating PDF/A documents

Displaying PDF/A documents

Mass-archiving PDF/A documents

Outline: Overview of PDF/A processes


Creating PDF/A documents


 
PDF/A Creation on a Client

 
PDF/A Creation on a server

PDF/A documents can stem from various sources. The software that creates them must follow the do's and don’ts of PDF/A (including the corrigendum):

  • The fonts used for texts must be embedded.
  • Color profiles must be defined for input images (scanned, converted).
  • Meaningful metadata must be available and embedded as XMP.

Creating PDF/A from a windows application

PDF documents can be created with the help of a PDF Producer (other names: PDF Creator, PDF Converter, etc.) from any Windows application using the print function. MS Office documents are usually converted this way. The conversion of emails with attachments is, on the other hand, more complex. In this case our 3-Heights™ Document Converter is more suitable. Alternatively, PDF documents can be created directly using a “Save to PDF” function such as the one in Microsoft Office 2007 (add-in must be downloaded) or Microsoft Office 2010.

Our 3-Heights™ PDF Desktop Producer and our 3-Heights™ Document Converter are both suitable for creating PDF/A.

Dynamic creation and customization of PDF/A

The PDF documents are program-created directly from an application (e.g. web server). This way, in addition to the static content, dynamic content can also be integrated from a database. In the very near future we will be able to offer a PDF Creator that can create PDF/A from such sources.

Image to PDF/A Converter

Converting image files to PDF/A files is in most cases a simple file conversion operation. For advanced tasks (such as color management) the conversion can be configured in a more complex way.

Scanning / OCR

Scanning, font recognition and conversion to PDF/A-compliant documents is a special area that requires a higher level of expertise (mixed raster content, compression processes).

To convert images or scanned documents to PDF/A you can make use of our 3-Heights™ Image to PDF Converter.


Processing and converting PDF documents

The conversion to PDF/A takes place after all other processing steps are concluded, but before applying the signature.

Possible PDF processing functions:

  • Split: Splitting a document into separate pages.
  • Merge: Merging individual pages to form a document.
  • Stamp: Application of watermarks, stamps, headers and footers, page numbers. (PDF Batch Stamp Tool)
  • Fill Forms: Program-based filling out of form fields with dynamic content. (PDF Form Filling & Flattening Tool)
  • Annotate: Adding interactive elements such as comments, bookmarks or text annotations (coming soon in the 3-Heights™ PDF Viewer).

Most PDF processing functions will not guarantee PDF/A conformity of the target document even if all source documents are PDF/A-compliant.

PDF to PDF/A Converter

The main purposes for converting to PDF/A are:

  • Preparation for archiving
  • Preparation for applying a digital signature
  • Preparation for document exchange (internal / external)

Converting PDF to PDF/A is no trivial matter. The following tasks are involved:

  • Eliminating the dependency on the source medium
  • Embedding font types
  • Creating the static appearance of interactive content
  • Eliminating transparency (transparency flattening)
  • Removing unauthorized content such as JavaScript

The 3-Heights™ PDF to PDF/A Converter is ideal for converting PDF to PDF/A.


Signing PDF/A documents

Signing takes place after conversion to PDF/A.

Adding a digital signature to a PDF/A document equates to incrementally changing the document. However, the document must be PDF/A-compliant before it can be signed. The original content of the document remains unchanged and the data structure of the digital signature is added at the end of the file. The digital signature itself must also be PDF/A-compliant. It is also possible to add several digital signatures (e.g. author’s signature, tester’s signature, releaser’s signature).

Modifying the document after addition of the digital signature

All modifications made after the document has been digitally signed must also be incremental and PDF/A-compliant. Typical modifications include editing (deletion, amendment and addition of text, annotations, etc.) as well as updating of content. There are currently no PDF/A-compliant processing tools that can work with already digitally signed documents.

Our following products can apply PDF/A-compliant digital signatures:

  • 3-Heights™ PDF to PDF/A Converter
  • 3-Heights™ PDF Security Tool

Validating PDF/A documents

PDF/A compliance must sometimes be verified in several places via validation.

Objectives of PDF/A Validation

The task of validation is to determine whether or not a PDF/A document complies with the ISO standard.

Areas of use

  • Incoming and outgoing inspection
  • Checks before and after certain processing steps
  • Control of processing (accept / reject)
  • Creation of a “Compliance Report”

Challenge

Validators must be tested and certified by an independent organization based on a generally accepted test suite.

For validation purposes, PDF Tools AG offers you the 3-Heights™ PDF Validator.


Displaying PDF/A documents

Displaying PDF/A-compliant documents is not the same as displaying documents in a manner that complies with PDF/A.

Conventional PDF viewers

Most display programs are not PDF/A compatible, which means they fail to take the standard's display requirements into consideration. A PDF/A display component should offer the following functions:

  • Warning in case a file contains elements that are not PDF/A-compliant.
  • Use of embedded fonts instead of the preinstalled fonts with the same name.
  • Use of embedded color profiles instead of the alternative color spaces.
  • Consistent display of the images of interactive elements contained in the file instead of recreating them dynamically.
  • Hyperlink shutoff option.

Display-oriented products

  • Adobe Acrobat 9 in PDF/A-modus
  • PDF Tools AG: 3-Heights™ PDF Viewer and 3-Heights™ Java Document Viewer

Mass-archiving PDF/A documents

PDF/A archiving with an advanced archiving system that allows merging of shared resources.

Separate, stand-alone PDF/A documents can be archived directly. When archiving a larger number of similar PDF/A documents (such as utility bills) it is often the case that the same styles, logos or other corporate identity elements need to be archived time and again for each individual document. The repeated saving of shared resources (fonts, images) is undesirable and reduces the acceptance of PDF/A.

The solution in this case is an advanced archiving system that separates the shared resources and saves them only once for all documents. When one of the documents is retrieved the shared resources and the document itself are merged again in a complete PDF/A document.

This process can also be applied to digitally signed documents. In this case, however, the document must be configured to accept the separation of shared resources when the initial signature is added.

The PDF Prep Tool Suite from PDF Tools AG was designed for this purpose.


Outline: Overview of PDF/A processes

The steps outlined above fitted into an overall process:


We will be pleased to help you find the best solution for your specific application. We look forward to hearing from you. Contact

PDF/A-1 vs PDF/A-2

Introduction

PDF/A was published by the ISO in 2005 to support long-term archiving of PDF documents.  The first release, PDF/A-1, was based on the original PDF 1.4 version where a set of standards criteria was introduced such as ensuring the visual reproducibility of PDF documents regardless of future changes to viewer and printing technologies, and making PDF documents accessible to persons with eye vision challenges. A detailed description of the PDF/A-1 standard and its benefits can be found on our White Paper page.
In July 2011, the ISO released the PDF/A-2 standard. In anticipation of introducing PDF/A-2, businesses are asking us two key questions:

1. What is different from PDF/A-1 to PDF/A-2?

2. Do I have to re-convert my PDF/A-1 documents to PDF/A-2?


This article will answer the two questions and offer insight to the PDF/A-2 standard.

What are the drivers behind the PDF/A-2 standard?

Where PDF/A-1 is based on PDF version 1.4, PDF/A-2 takes advantage of features that only became available in later versions of PDF, up to and including PDF version 1.7. But most importantly, PDF/A-2 is no longer based on a particular Adobe PDF version, but instead, is now based on an ISO standard 32000-1.

What is different from PDF/A-1 to PDF/A-2?

PDF/A-2 introduces a number of features:

  • JPEG2000 Compression: The JPEG2000 compression was introduced with the PDF 1.5 specification which was past the release time of the PDF/A-1 standard. Adding the JPEG2000 compression benefits particularly scanned documents such as maps, books and documents with color content such as checks, or passports.
     
  • Embedded PDF/A Files via Collections: Acrobat allows users to create collections (sometimes also referred to as “portfolios“) where multiple PDF/A documents are combined into one “container PDF“ document. A possible use of a PDF/A collection is for instance the archival of emails where email attachments can be converted to PDF/A and stored as “collections“ inside a converted PDF/A email text body.  PDF/A collections can also benefit security applications where a signature can be applied to individual single pages. The PDF/A collection then combines the signed single page. Individual pages can subsequently be removed without affecting the validity of the signatures of the remaining pages.
     
  • Transparency: Although transparency is part of PDF 1.4, at the time of the PDF/A-1 standard release it was not defined well enough to be included in the PDF/A-1 standard. The specification has substantially matured since then, and transparency has become a common characteristic of PDF documents. Transparency is often found in the form of drop shadows, cross fades and highlight mark-ups for example.
     
  • Optional Content (Layers): Optional content – sometimes also referred to as layers – is useful for mapping applications or engineering drawings where individual layers can be shown or hidden according to the information requirements of the viewing person. Another area of use is in user manuals of products that are sold internationally – where different languages can be implemented on different layers.
     
  • New Conformance Level PDF/A-2u – “u“ for Unicode: PDF/A-1b and PDF/A-2b concentrate on visual integrity, where “b“ stands for “basic“. PDF/A-1a and PDF/A-2a concentrate on accessibility – hence the “a“ notation. New to PDF/A-2 is the conformance level PDF/A-2u (“u“ for “Unicode“). It simplifies the text searching and copying of Unicode text for digitally created PDF documents and PDF documents that were scanned with subsequent optical character recognition (OCR).
     
  • Object Level XMP Metadata: PDF/A-2 specifies the requirements for custom XMP metadata.
     
  • Comment Types and Annotations: Some of the newer comment types were added to the list of prohibited annotation types, and at the same time some of the newer comment types such as text editing comments are now acceptable to the PDF/A-2 standard.
     
  • Digital Signatures: While PDF/A-1 already allows for digital signatures, PDF/A-2 defines the rules that need to be applied to guarantee interoperability.

Do I have to re-convert my PDF/A-1 documents to PDF/A-2?

PDF/A-2 does not replace or supersede PDF/A-1 in any way. PDF/A-1 compliant documents that were already created will remain valid PDF/A files for long-term archiving. Archived PDF/A-1 documents can remain unchanged in the storage archives, so an “upgrade“ to PDF/A-2 is not necessary.

For organizations that find the features introduced with PDF/A-2 useful, converting the original source documents to PDF/A-2 will have an advantage. But likewise, for organizations that do not see a benefit of the features introduced with PDF/A-2, converting source documents to PDF/A-1 will continue to work well, both – PDF/A-1 and PDF/A-2 fully support the long-term archiving of PDF documents.

PDF/A-2 vs PDF/A-3

Introduction

The first part of the standard was based on PDF Version 1.4 and contained a range of rules such as safeguarding the unambiguous visual reproduction of PDF documents, independence from display and printing technology, and the availability of PDF documents to people with limited visual acuity.

It was followed in July 2011 by the ISO standard PDF/A-2 (ISO 19005-2), which was extended by various functions that were made possible by later PDF versions (up to and including PDF Version 1.7). This extended PDF standard is no longer based on a PDF version by Adobe, but on the ISO standard 32000-1 (PDF 1.7).

The ISO Committee released the third edition of the standard (ISO 19005-3) in October 2012; its official title is: Document management – Electronic document file format for long-term preservation – Part 3: Use of ISO 32000-1 with support for embedded files ( PDF/A-3).

It is based on ISO 32000-1 (PDF 1.7), as was PDF/A-2. Analog to PDF/A-2, the PDF/A-3 standard also defines three conformity grades:

  • PDF/A-3a is concerned with accessibility.
  • PDF/A-3b is concerned with visual integrity, i.e. the consistently identical display of the document.
  • PDF/A-3u facilitates the searchability of texts and the copying of Unicode text for digitally created PDF documents and those scanned using optical character recognition (OCR).

What is the difference between PDF/A-2 and PDF/A-3?

PDF/A-3 contains just one change that is necessary but controversial: PDF/A-2 already enabled the embedding of PDF/A-compliant documents as attachments. PDF/A-3, however, makes it possible to embed any document format such as Excel, Word, HTML, CAD or XML files for the first time ever.

The purists among the experts opine that this amendment is contradictory to the original idea behind the PDF/A standard. However, pragmatists in companies from various segments, such as the pharmaceutical industry or the banking and financial sector, have a concrete need to keep the original file format alongside the converted PDF/A file. Files that belong together are compiled to form a “collection”. This construct has been known since the days of PDF/A-2. Typical applications include archiving emails and their attachments that can consist of many different file formats.

The standard only assures the representation of PDF/A documents viewed via a compliant viewer. The presentation of non-compliant embedded documents is implemented via a separate action using the tools that support the document formats in question.

Which PDF/A format is the right one?

PDF/A-3 should only be used if you plan to embed documents that do not comply with the PDF/A standard. PDF/A-2 is the right choice in all other cases as it makes it quite clear that no other formats are embedded. PDF/A-1 is still good enough for anyone who does not need all the functionality offered by PDF/A-2. There is no need to migrate existing archives as a PDF/A-3 compliant viewer can display all PDF/A compliant files.

Conclusion

PDF/A-3 meets an important user requirement, namely an option to embed file formats that do not comply with the PDF/A standard. Because this amendment is desirable but controversial, it remains the only change to the PDF/A-2 standard. The user can therefore choose between a pure PDF/A collection and a mix of the various standards that is easily differentiated thanks to the “PDF/A-3” label.

 

PDF/A products by PDF Tools AG

  • Overview
  • Creating & converting PDF/A
  • Displaying & printing PDF/A
  • Signing & validating PDF/A
  • OCR for PDF/A

Overview

PDF Tools AG has the most comprehensive range of PDF/A products worldwide. The overview diagram shows all relevant processes related to PDF/A and the products of PDF Tools AG for...

  • Creating and Converting
  • Displaying and Printing
  • Signing and Validating
  • OCR

Creating and converting PDF/A

3-Heights™ Document Converter

The 3-Heights™ Document Converter is a company-wide solution for converting all popular file formats to PDF/A. Besides applying PDF/A compliant digital signatures, it provides support for embedding PDF/A compliant OCR information and metadata. It is especially suitable for processing incoming mail and for the conversion to PDF/A for archiving in the long term of emails with attachments and Office documents. More information about the Document Converter

 

3-Heights™ PDF to PDF/A Converter

The 3-Heights™ PDF to PDF/A Converter is a component for converting PDF documents into the PDF/A format for long-term archiving. The tool analyzes and converts the input document, applying a digital signature where required. This component is both robust and scalable, making it suitable for integration in various processes such as standardization, quality assurance and archive migration. More information about the PDF to PDF/A Converter

3-Heights™ Image to PDF Converter

The 3-Heights™ Image to PDF Converter component converts raster images such as TIFF and JPEG to PDF/A and PDF documents. Typical applications include the conversion of scanned documents to PDF/A documents, the migration of a TIFF archive to PDF/A and the conversion of fax files to PDF/A with optional OCR recognition. More information about the Image to PDF Converter

3-Heights™ PDF Producer

The 3-Heights™ PDF Producer creates PDF/A documents from any Windows application via the print function. PDF/A documents can also be created at the touch of a button from within Microsoft Office and the result displayed automatically thanks to a plug-in. The API version offers software development partners in particular a multitude of additional options. More information about the PDF Producer

Displaying and printing PDF/A

3-Heights™ PDF Viewer

The 3-Heights™ PDF Viewer is a compact and sophisticated component for viewing PDF documents. It displays PDF documents in a PDF/A compliant way and additionally supports displaying raster images such as TIFF and JPEG. It also provides an option to print the document on display. More information about the PDF Viewer

3-Heights™ PDF Printer

The 3-Heights™ PDF Printer is a compact component for printing PDF documents. PDF documents are used in many areas of business and need to fulfill different demands with regard to printing. These include a high throughput rate, high visual fidelity even with complex documents, PDF/A-compliant printing, printing on paper or in print data streams such as PS, PCL and XPS. More information about the PDF Printer

Signing and validating PDF/A

3-Heights™ PDF Security Tool

The component is able to apply various types of electronic signature (simple, advanced and qualified) in a PDF/A compliant way. The component's benefits include PDF/A conformity, embedding information on the validity of certificates (OCSP, CRL), time stamps and compatibility with hardware signature modules (HSM) for mass signature applications. The component can verify existing signed documents by checking their integrity. More information about the PDF Security Tool

3-Heights™ PDF to PDF/A Converter

The 3-Heights™ PDF to PDF/A Converter is a component for converting PDF documents into the PDF/A format for long-term archiving. The tool analyzes and converts the input document, applying a digital signature where required. This component is both robust and scalable, making it suitable for integration in various processes such as standardization, quality assurance and archive migration. More information about the PDF to PDF/A Converter

3-Heights™ PDF Validator

The 3-Heights™ PDF Validator safeguards the quality of PDF documents and the processes that create them. Documents are checked for compliance with the ISO standards for PDF and PDF/A documents. PDF is a widespread format; it is therefore important that interoperability is ensured. Documents containing information of relevance to business or intended for archiving, in particular, need to be validated with regard to their correctness and long-term legibility. This is precisely what the Validator does, whether for a single document or an entire batch. More information about the PDF Validator

 

OCR for PDF/A

3-Heights™ OCR Enterprise Add-On

The 3-Heights™ OCR Enterprise Add-On compliments several products of PDF Tools AG with a high performance optical character recognition (OCR) function. This allows for converting images such as TIFF or JPEG to PDF or PDF/A, or converting PDF to PDF/A and applying OCR at the same time. More information about the OCR Enterprise Add-On

 

Follow and tell

   
 

Subscribe newsletter

Events

March 18–19, 2015
Documation, CNIT – Paris La Défense, France

We are here to help

Easy ways to get the answers you need.

Contact via email

Via phone :
08:00-17:00 HEC (UTC+1)
+41 43 411 44 51

Copyright 2001-2014 PDF Tools AG

Sitemap | Privacy | Legal | Masthead