Author: Dr. Hans Bärfuss
PDF 2.0 – next generation
It is rare for industrial products to survive for more than 20 years – especially in the IT industry. Not even the inventors of the PDF could have imagined just how successful their file format would be when they launched the first version of Acrobat in June 1993. The members of the International Organization for Standardization (ISO) are now working on the next generation of this popular format.
Since ISO-32000-1, entitled “Document Management – Portable Document Format– Part 1: PDF 1.7”, was published in mid-2008, the sixth edition of Adobe’s famous PDF Reference has not changed significantly – just translated into the ISO language. But this will change when the second part of the standard, “Part 2: PDF 2.0”, is published shortly. This new version has been created by the ISO members, or to be precise by Technical Committee 171, Sub-committee 2. To make it clear that this is a new standard, a ‘2’ has been added to the main version number. The standard is currently at the DIS (Draft International Standard) stage, a step shortly before publication, and will be put to the vote on 16 September 2015.
What will the new standard offer?
The list of changes contains more than 50 entries. The most important changes and improvements relate to the following areas:
- Encryption: unencrypted wrapper of encrypted documents, 256-bit AES encryption, unicode passwords
- Digital signatures: signatures based on the CAdES standard, certificates based on elliptic curves, long-term signature validation (LTV)
- Annotations: projections, 3D, rich media
- Accessibility: pronunciation hints
- 3D: support for the new ISO standard ‘PRC’, 3D measurements
- Document parts
The committee has also been brave enough to scrap some outdated features; the main ones are:
- XFA forms: Adobe’s XML-based form technology has been a constant source of frustration for many providers
- Movie, sound: multimedia content is not compatible with the concept of a portable document format
- Superfluous, redundant, outdated or non-portable information, such as the document information dictionary (replaced by XMP), outdated digital signatures, OS-dependent file names and rarely used standards, such as OPI (Open Prepress Interface)
There have also been some major revisions to the new part of the standard, particularly in the following chapters:
- Digital signatures
- Tagged PDF and accessibility Support
But the numerous changes have taken their toll. It has taken seven years to create the second part, much longer than was needed for previous versions. In fact, Adobe managed to release seven versions in just 15 years – and in outstanding quality. On the plus side, the second part of the standard has received extensive input from the ISO members, and many parts of the text are worded more clearly. This makes it easier for the industry to understand the specification, increase the implementation quality and thereby improve interoperability. It is hoped that this will result in far fewer ‘bad’ PDFs.
What effect will the new version have?
For the main uses of PDF – i.e. archiving (PDF/A), document exchange (PDF/X), engineering (PDF/E) and accessibility (PDF/UA) – ISO has defined special sub-standards, most of which are based on the first part of the PDF standard. It is likely that these standards will also be adapted to make them relevant to the second part. However, it should not be assumed that the master standard will now be ‘new’ and the sub-standards ‘old’.
Instead, the development of these standards should be seen as an interaction. For example, many changes in the second part of the PDF standard are based on findings derived from working on the sub-standards and incorporated in the development. In addition – unlike the PDF master standard – there is no real urgency to change the PDF/X, PDF/E and PDF/UA sub-standards, as they have been optimized independently of Adobe for some time now. The situation for PDF/A is somewhat different.
The special case of archiving
As soon as the first Version 2.0 PDF files are created, the question will arise as to how they can be archived in accordance with the standard. PDF/A must have an answer to this question. Unlike the other PDF sub-standards, this application is under a certain degree of time pressure. But the sheer number of changes is making it difficult to find a quick solution. The PDF/A community also faces other problems, and in particular related to validation.
The validation process checks whether a PDF file conforms to a certain standard. This check is common for PDF/X and PDF/A files and is vital for archiving purposes, as violations of the standard can lead to archived files no longer being perfectly legible after 10 or more years.
Differences in the validation of PDF/A files
Various commercial software programs (validators) can be used for checking conformance with different parts of the PDF/A standard. As all parts of the PDF/A standard are based on specific master standards, such as PDF 1.4 and PDF 1.7, the validators must also check conformance with these standards. In addition, the producers of these validators often interpret the text of the standards differently. Both these factors can lead to different validators producing different test results.
Furthermore, individual users may have unrealistic expectations of what a validator can and should be able to do. The deficiencies of the validators and the unrealistic expectations of individual users can both lead to the PDF/A concept being questioned. Calls for a ‘definitive validator’ are therefore growing louder.
The VeraPDF project
VeraPDF is a project by the Open Preservation Foundation. It was established as a consortium between the partners PDF Association, Dual Lab, The Digital Preservation Coalition and Keep Solutions. The aim of the project is to develop a PDF/A validator. The development work was put out to tender by PREFORMA (‘PREservation FORMAts for Culture Information/E-Archives’), a pre-commercial purchasing project (PCP), co-founded by the EU's FP7-ICT program. PREFORMA's open source validator will be used for three file formats – PDF, plus TIFF and a video format – and will support long-term archiving in memory institutions. VeraPDF was awarded the contract for the first two implementation phases of the PDF/A validator. The first phase, which was concerned mainly with determination of the validator's specifications, is now complete. The second phase, the realization of a prototype, is close to completion.
So far, the project has shown that it is not so easy to develop a PDF/A validator from scratch; it requires a great deal of PDF experience. For political reasons, it was not possible to assign the task to one of the commercial validator producers. Involvement of the PDF Association offers the benefit of being able to draw on the producer's experience and ensures that the development of the validator receives support from a wide range of sources.
The disadvantage, however, is that the work becomes much more complex and lengthy. Furthermore, funding is not available to implement the project in full, so the producers are attempting to make the concept usable by means of software architecture (plug-ins).
One advantage that the project will have over commercial validators is that the validator itself can be ‘validated’. In theory, this will be made possible by making the entire program code public. In reality, almost no one really looks at the code. That's why the producers are attempting to focus on suitable test files. However, it comes as no surprise that development of the test files takes as long as development of the validator itself. The existing test suites – such as Isartor, developed by the PDF Association – are nowhere near sufficient.
Although the new PDF version is about to be released, the VeraPDF project has not yet delivered any concrete results for the validation of PDF/A documents. The experiences with this project are starting to divide the community and disappointment is spreading. Many expectations are not being met, or not until very late. Thus, users will probably have to continue relying on commercial validators for the foreseeable future.