Importing images into a PDF file - a seemingly trivial task

A picture is worth a thousand words. That's why they are fondly embedded in PDF files. One would expect that embedding images in a PDF file is a simple task. Because it seems so easy, there are also many, including free, tools for it. But do these tools do what you expect them to do? A closer look reveals that embedding images is anything but trivial.

Let's start by embedding image data of the popular JPEG format. Many PDF creation programs simply take the JPEG data stream and embed it as is in the PDF file. Does that just work? The answer is: in most cases, but not always. The PDF standard requires that only so-called baseline JPEGs can be embedded. But there are also so-called multi-scan, progressive and arithmetically encoded JPEGs. These are not allowed and can then not be displayed by many PDF viewers. The fact that Acrobat displays such PDF files without an error message does not really prevent their dissemination. This is especially troublesome when the file claims to conform to PDF/A, as many PDF validation tools do not examine the image streams for conformance with the standard.

It becomes similarly problematic if the fax G3 image data streams are transferred from a TIFF container into a PDF file. The specifications of TIFF and PDF are, for whatever reason, slightly different, so this project can go quite wrong.

When embedding image data, the format of the image source must be carefully analyzed and the data stream converted or even re-compressed to conform to the PDF specification.

But there are other reasons to edit the image stream. JPEG streams also often contain many segments that are not needed in PDF or that must be stored in another place. For example, Exif data (camera settings, GPS location, etc.) should be extracted, converted to the XMP metadata format, and assigned to the image object as a separate property. Other segments, such as private Photoshop data, can also be removed because they have no use and only take up a lot of space.

Apart from the image stream, there is also other information in the source images that should be transferred to the PDF file but is often forgotten. Typically these are color profiles and metadata. But it is not that simple. Since TIFF files and other image formats can not directly be embedded in the PDF file, the containers must be unpacked and converted in such a way that they can be transferred to the PDF file. For example, the information in the TIFF file is stored in so-called tags, which contain metadata in addition to the image data. This metadata must first be converted to XMP format before it can be embedded in the PDF file. It's similar with the colors. Often the descriptions of the color spaces are not simply stored as color profiles and have to be converted into a PDF color space description.

So there are many small details to consider that are not properly or not at all handled by many tools. It is therefore worthwhile to use a professional tool for a seemingly trivial task.

A picture is worth a thousand words. However, it is certainly necessary to write more than a thousand words about how images should be embedded correctly in a PDF file.

Like what you see? Share with a friend.

Grüezi! How can we help?

Phone