The history and origin of the format PDF/A
Cet article est seulement disponible en anglais ou en allemand
PDF/A will undoubtedly establish itself as the standard long-term archiving solution for electronic documents. It was published as an ISO standard on October 1, 2005 and has since set out to conquer the world. As Swiss representative on the ISO committee for PDF/A, PDF Tools is your competent point of contact for all issues dealing with PDF/A. We will be more than happy to assist you if you cannot find the answers to your specific questions on this website.
Introduction to PDF/A
Background - what's behind PDF/A and where did it come from
On September 28, 2005 the International Standards Organization (ISO) approved a new standard governing archiving of electronic documents:
ISO-19005-1 - Document management - Electronic document file format for long-term preservation - Part 1: Use of PDF 1.4 (PDF/A-1).
The standard was the result of more than 36 months of collaboration among companies and organizations around the world.
In May 2002, the Association for Information and Image Management (AIIM), the National Printing Equipment Association (NPES) and the administrative body for the US Courts launched an initiative for the creation of standardized formats for electronically archived documents. The kick-off meeting took place in October 2002. It was attended by PDF manufacturers such as Adobe Systems, Library of Congress, Surety Inc., Quality Associates Inc., Appligent, Merck, EMC, PDF Sages, and NARA (National Archives and Records Administration). Xerox, Honeywell, EDS, and Glaxo Smith Kline also joined in later, just to name a few.
The founders of the project put together a first version and submitted their recommendation to the ISO in order to have it registered as an international standard. The ISO assigned the project to the Technical Committee TC 171 (Document Management Applications). TC 171 consists of representatives of 13 member countries (one vote each) as well as observers from another 21 countries. After numerous reviews and improvements, the standard was accepted in September 2005.
Why the PDF/A initiative?
Archiving formats vary from country to country. Traditional archiving methods (paper, microfilm, microfiche), while guaranteeing reproducibility, no longer comply with the latest technology. Large documents cannot be quickly mailed around the globe, and it is extremely difficult to search the archived documents for specific content. Many organizations set up TIFF archives as a first step towards electronic archiving. TIFF also guarantees long-term reproducibility and is a well-established format. TIFF can now be transmitted quickly and easily in globally connected organizations; however, searching is still difficult.
PDF began to be considered at this point. A number of reasons make PDF more attractive than TIFF:
- PDF saves structured objects (such as texts, vector graphics, raster images) that support efficient searching in the entire archive. TIFF, on the other hand, is a raster format and must be processed with an OCR machine to enable a full-text search.
- PDF files are more compact and often require a fraction of the storage space of a corresponding TIFF file, often even with better quality. The small file size is especially beneficial for electronic data exchange (FTP, email attachments, etc.).
- Metadata such as title, author, date of creation and modification, content, keywords, etc. can be embedded directly in the PDF document. Thus, they can be classified automatically without any human intervention.
- The page contents in a PDF document are usually device-independent, i.e. independent from the raster resolution, color code, etc. The pages are not displayed on the raster until the reproduction (rendering process). PDF documents therefore benefit from the technological progress of output equipment, such as printer, monitor etc. even years later.
The creator of PDF de facto standards, Adobe Systems, has published eight new versions of its 'PDF Reference Manual' in the last thirteen years. Each new version expanded the format with numerous new features and modified some of the old features. It was therefore necessary to develop a stable, internationally accepted standard for long-term archiving, built on Adobe's proprietary PDF specifications. The outcome: PDF/A.
The PDF/A standard
Purpose of PDF/A
The ISO standard 19005 defines a file format based on PDF called PDF/A. The format offers a mechanism that represents electronic documents such that the visual appearance remains preserved for an extended period, independent of tools and systems for producing, saving and reproducing it.
This standard specifies neither the methods, the intention nor the purpose of preservation. The standard is thus intended to guarantee that electronic documents can be viewed in their original appearance, even in the future. For this reason, the document may not refer, either indirectly or directly, to any external source. An example would be an external image or a font that is not embedded in the document itself.
Comparison between PDF and PDF/A
The normal PDF format does not guarantee long-term reproducibility or complete independence from the software and the output device. In order to guarantee both principles, it was necessary to both limit and expand the existing PDF specification. It was clear from the outset that PDF/A-1 had to be based on an existing version of PDF in order to achieve the acceptance of a wide audience. The ISO committee TC 171 chose the Adobe PDF Reference 1.4 as a basis for the PDF/A-1 standard.
The PDF Reference 1.4 was implemented by Adobe in their Acrobat 5 product. PDF/A-1, as a standard, must fulfill all requirements of this document, and must also respect certain technical limitations of Acrobat 5. The original PDF Reference and ISO 19005-1 together comprise the current PDF/A-1 Standard. ISO 19005-1 only identifies the differences with respect to the PDF Reference. Accordingly, PDF Reference 1.4 is the central basis on which to comprehend the PDF/A-1 standard.
Several PDF 1.4 features, such as transparency or the reproduction of audio and video, are prohibited in the PDF/A-1 standard. Certain options of PDF 1.4 are mandatory in PDF/A-1: for example, all fonts used must be embedded in the document. Essentially, the PDF/A-1 standard does nothing other than specifically identify individual characteristics of PDF Reference 1.4 and to indicate whether each is absolutely necessary, recommended, limited, or not permitted.
The PDF/A, A-1a, A-1b, A-2 "Babylon"
The PDF/A-1 Standard is divided into two levels of conformance: PDF/A-1a and PDF/A-1b.
PDF/A-1a (Level A Conformance) defines conformance with all requirements of the PDF/A-1 standard.
The minimum requirements for conformance with PDF/A-1 are contained in PDF/A-1b (Level B Conformance). The PDF/A-1b requirements are generally sufficient for unequivocal reproduction over an extended period.
PDF/A-1a differs from PDF/A-1b mainly with respect to accessibility requirements (Paragraph 508 of the US Rehabilitation Act).
- PDF/A-1a guarantees that the document text is extractable and that the logical structure of the document as well as the natural reading process of integrated text material remain intact. Text extraction is mainly of interest if documents are to be displayed on mobile devices (e.g. PDA) or visualized in the sense of Paragraph 508 of the US Rehabilitation Act. This includes the requirement that the representation of the text fit on the reduced screen by being restructured (re-flow). This functionality is also known as tagged PDF.
- PDF/A-1b ensures that text and other content on pages is reproduced uniformly; it is not a guarantee, however, that the embedded text is comprehensible and machine-legible. The creator of a PDF/A-1b conform file is free to embed the text in a readable form, even if the more stringent requirements pursuant to the aforementioned Section 508 are not met.
For scanned documents, conformance with PDF/A-1b is completely sufficient, even if they have been processed using OCR to enable a full text search.
In July 2011 the Technical Committee released a new part of the standard: ISO 19005-2 (PDF/A-2). Where PDF/A-1 is based on PDF version 1.4, PDF/A-2 takes advantage of features that only became available in later versions of PDF, up to and including PDF version 1.7. But most importantly, PDF/A-2 is no longer based on a particular Adobe PDF version, but instead, is now based on the ISO standard 32000-1.
The ISO Committee released the third edition of the standard (ISO 19005-3) in October 2012. PDF/A-3 contains just one change that is necessary but controversial: PDF/A-2 already enabled the embedding of PDF/A-conform documents as attachments. PDF/A-3, however, makes it possible to embed any document format such as Excel, Word, HTML, CAD or XML files for the first time ever.
Use of the PDF/A standard
How do I get a copy?
The PDF/A standard ISO 19005 can be purchased from the ISO website. Copies can be ordered on paper or electronically in PDF format and, like all other ISO standards, are protected by copyright. It is therefore illegal to offer free copies via the internet. The standard is currently available in English only.
Who should read the standard?
The purpose of the PDF/A standard is to support and improve archiving strategies. The standard itself is quite technical in nature and can only be understood by experts with extensive knowledge of page description languages, such as PostScript and PDF. The main document itself is small, however the scope of the basis document is very large. The PDF Reference 1.4 alone contains 1,000 pages, not including the referenced documents (font and compression formats, XML specifications, ICC color profiles, digital signatures, RFCs, etc.).
Furthermore, the just the standard alone does not guarantee long-term preservation. It is recommendable to consult an expert to fully understand the PDF/A requirements, to implement a company-wide archiving policy based on it and to achieve the long-term objectives of document archiving.
What tools are available?
Tools for creating, processing and validating PDF documents have been available on the market since mid-2006. Adobe itself integrated corresponding features in Version 8 of Adobe Acrobat, released in Fall 2006. Microsoft also provides a separately downloadable plug-in for Office 2007 that enables the creation of PDF/A conform files directly from Office products. Given the number of products for creating of PDF/A already on the market, it has now become very important to test each created PDF/A document with respect to proper PDF/A conformance.
PDF/A requires a comprehensive solution
The PDF/A standard is merely a component of a comprehensive solution. PDF/A alone does not guarantee long-term preservation, or that the display functions as intended. Neither does PDF/A claim to be the most appropriate solution in every scenario. On the other hand, PDF/A defines the specific requirements for electronic documents so that they can be preserved over the long-term.
Other aspects must be considered if a PDF/A conform archive is to be implemented. These include, among other things, in-house company standards and processes, quality management, reliable data sources and dedicated requirements tailored to the specific application purpose. In particular, the migration of existing paper or TIFF archives to a PDF/A conform archive is not an insignificant task, and must therefore be planned carefully.
PDF/A as a new archiving standard
PDF/A is expected to become the new standard for archiving electronic documents. PDF is omnipresent in private and public sectors worldwide and is already accepted as a format for countless purposes. The PDF/A standard will help to ensure that users will be able to safely reproduce documents even after a long period of time.
The introduction of the PDF/A standard will (as it should) probably influence the future development of PDF itself. Independent of this, Adobe will continue with improvements and the introduction of new features. Examples include 3-D models or XFA for dynamic PDF forms. This will put further pressure on the standard, because the essence of a standard - especially an archiving standard - is that it is not frequently modified.
How will the market react?
We should not expect that PDF/A products will inundate the market. It takes considerable knowledge to understand the technology behind PDF/A. Moreover, the user has higher quality requirements with standard-conform software.
The first tools appeared on the market in mid-2006. In demand are PDF/A conform production, PDF/A validation as well as a simple conversion of existing PDF documents into conforming PDF/A files.
The appearance of the first professional PDF/A tools has already triggered processes for the implementation of PDF/A conform archiving systems. Too much functionality should not be expected at this point. It is likely that initially only limited PDF/A-1b will be offered, and the complete PDF/A-1a not until later.
As is so often the case when introducing a new standard, many products will be released to the market that advertise PDF/A conformity yet do not actually fulfill the requirements of the standard. Expertise for evaluation and reputable providers are particularly in demand during the launch phase.
Hot air or long-term strategy?
PDF/A will not be short-lived. The need for a standardized framework for archiving with PDF has existed for several years. And: PDF is already being used for this purpose in many applications, with the help of company-specific policies.
The fact that Microsoft is responding to customer demand by making it possible to create PDF/A documents directly from the most recent Office palette is a clear signal. Internationally accepted, PDF/A is here to stay.