Handling embedded and non-embedded fonts in PDF & PDF/A documents
Although the first part of the PDF/A standard was published in 2005 there is still a need for some clarifications regarding fonts and embedding. What does the standard exactly require? How should PDF to PDF/A converters handle fonts? How do viewers actually deal with them and how should they?
Let us start with the easiest case. If you create a PDF/A document then in general you have to embed all used fonts. This is true for any flavor of PDF/A such as PDF/A-1b or PDF/A-3u etc. There is only one exception to this. If the text is not visible (text rendering mode 3) then embedding is not required. Invisible text is often used to overlay a scanned page with the text from an OCR engine in order to allow for searching text in a scanned document as if it was a born digital document.
If embedding is required, however, then the font can be minimized so that it only contains those characters which are being used by the document, e.g. if a document shows the single text string "help" in Arial then the embedded Arial font program can be reduced to contain only four characters. This process is called subsetting and it is extensively used to reduce the size of the created file.
But creators must pay attention to some characters which are composed from others such as the German character "ä" which can be composed from the character "a" and "¨". This is one of the sources of bad PDFs with incomplete fonts programs.
If an embedded font is used to fill out text in form fields then the whole font must be embedded since the creator doesn't know in advance which characters are eventually selected by the user. From a technical point of view, the text remains editable if the associated font is not subsetted and vice versa. But there are also legal constraints.
The embedding and also the subsetting of a font is subject to licensing of the font manufacturer. The majority of the licenses grant the right to freely use the font for reproduction such as viewing and printing but restricts creating and editing of text to the license owner. In any case you should carefully check the license conditions before using a font to avoid legal issues.
TrueType and OpenType fonts contain usage rights information which tells the creator software whether a font is allowed to be embedded or not. Some creators obey these flags, others don't. Whatever this information tells you it can only be regarded as a hint. In the end the written license text which comes with the purchased font is the only decisive source of information.
PDF to PDF/A Conversion
A PDF to PDF/A Converter software has to embed fonts if they aren't. For a well formed PDF input document this is not a problem. If the font is found (by name) in the installed font collection of the operating system then it is used. If it is not found it is replaced by a font which has similar characteristics as the searched font. Such fonts are often synthesized using a generic font template for serif and non-serif characters (Multiple Master Fonts) instead of installed fonts.
If the PDF input document is not well formed (e.g. if non-embedded fonts exist which are symbolic or CID fonts without a known CMAP etc.) then the converter must use similar heuristics as a viewer would use in such a situation. But since these algorithms aren't bullet proof the result might not look like as expected or the conversion may even fail.
A viewer (in general a software which reads PDF files) may behave differently dependent of whether the document claims to conform to the PDF/A standard or not. If the document carries the PDF/A label then the viewer is required to use the embedded fonts whereas for a regular PDF document it may use the installed fonts instead. Using an installed font is usually faster than loading the embedded font from its compressed and possibly encrypted data stream. On the other side even if fonts have the same name they may look and behave differently.
Grüezi! Wie können wir helfen?
PDF Tools AG
8050 Zürich, Switzerland