Glossary of all things PDF

Our beloved portable document format (yep, that’s what PDF stands for) just turned 30 in 2021. And it did become more and more powerful over the years! Find the explanations of the latest lingo and join the movement.

AES - Advanced Encryption Standard

Symmetric encryption method published as standard by NIST.

Annotation

Associates an object (for example, a memo, a piece of music or a film) with a position on the page, or represents an opportunity to interact with the user with the help of mouse and keyboard.

Many PDF documents are designed in a way that does not allow the user to change them, but to interact nonetheless through the use of form fields and checkboxes.

Anti-aliasing

Distortion, or aliasing, may occur at the edges of an object depending on the image‘s resolution.

Anti-aliasing methods can be used to minimize this effect. The edges are smoothed out with adjusted color values via retroactive filtering.

Array object

A one-dimensional collection of sequential objects with implicit numbering starting at 0.

ASCII

The American Standard Code for Information Interchange, a widely used convention for the binary encoding of a specific set of 128 characters. The ASCII character set contains the space character (or blank) and the following characters:

"#$%& '()*+,-. /0123456 789:;<=> ?@ABCDEF GHIJKLMN OPQRSTUV WXYZ[]^ _`abcdef ghijklmn opqrstuv wxyz~

ASN.1 - Abstract Syntax Notation #1

Description language for the syntax of digital messages. For the binary encoding of the messages suitable standards are BER and DER of X.690.

BER - Basic Encoding Rules

Easy to handle rules for the binary encoding of digital messages.

Binary data

An ordered sequence of bytes. Images and fonts are examples of objects stored as binary data.

Boolean object

Either the keyword true or the keyword false.

Byte

A group of 8 binary digits (8 bit) that collectively can represent one of 256 different values. These 8 binary digits are used in a multitude of today's electronic devices.

CA - Certification Authority

Accredited issuer of certificates.

CAdES - CMS Advanced Electronic Signatures

An ETSI Standard for the standardization of CMS-based digital signatures.

Catalog

The primary dictionary object that contains the direct or indirect references to all other objects in the document with the exception of the trailer, which the catalog does not reference.

Certificate

A certificate is an electronic certification of the identity of a natural or legal person. The certificate also contains a public key for which only the person possesses a corresponding private key. With this private key, the person can generate digital signatures. Any person can verify this signature with the help of the certificate.

Character

A byte whose value is usually interpreted as a symbol within a symbol set with 256 or fewer members. Character examples: 1, 2, a, b, A, &, etc.

Character set

A defined set of symbols, whereby a unique byte value is assigned to each character. Character examples:

  • ASCII

  • Unicode

CMS - Cryptographic Message Syntax

Message format for digital signatures based on PKCS#7 using the ASN.1 syntax.

Conforming product

Software application that is both a conforming reader and a conforming writer.

Conforming reader

Software application that can read and edit a PDF file that conforms to a specification (e.g. [ISO 32000] or [ISO 19005-1]), and that is compliant with the requirements of a conforming reader.

Conforming writer

Software application that can write PDF files that conform to a specification such as [ISO 32000] or [ISO 19005-1].

Content stream

A datastream object whose data consists of a sequence of instructions that describe the graphic elements of a page.

Corrupt PDF file

A PDF document that is not correct and may therefore be unreadable. Possible causes include:

  • The document was not generated correctly

  • The document was damaged after its creation (e.g. incomplete copying process)

CRL - Certificate Revocation List

List of revoked certificates published by the issuer.

Cross-reference table

Data structure containing the byte offset start for all of the file‘s indirect objects.

DER - Distinguished Encoding Rules

Rules for the binary and unique encoding of digital messages based on BER.

Dictionary object

An associative table of object pairs; the first object is the object name and functions as the key, the second object is the value and can be any type of object, including another dictionary.

Direct object

Any object that has not been made an indirect object.

DSA - Digital Signature Algorithm

by the NIST

DSS (Cryptography) - Digital Signature Standard

by the NIST

DSS (PDF) - Document Security Store

Structure in a PDF document to embed signature validation information such as CRLs, OCSPs, and certificates.

eIDAS - Electronic Identification, Authentication and trust Services

An EU regulation set of standards for electronic transactions.

Electronic document

An electronic representation of a page-oriented compilation of text, images, and graphic data, as well as metadata that helps to identify, understand, and display the data. Electronic documents can be reproduced on paper or displayed on screen without any significant loss of information.

Encryption

Data are encrypted so that outsiders cannot deduce their meaning. For the communication between sender and recipient, the recipient generates a key pair consisting of a private and a public key. If the sender now encrypts the data with the public key, only the recipient can decrypt the data because the recipient remains the sole owner of the private key. For the encryption, algorithms like RSA with key lengths of currently 2048 bits are used. The usual procedures for digital signatures are based on this technology.

End–of–line marker (EOL marker)

A sequence of one or two characters marking the end of a line and consisting of:

  • a CARRIAGE RETURN character (U+000D)

  • or a LINE FEED character (U+000A)

  • or a CARRIAGE RETURN followed directly by a LINE FEED.

ETSI - European Telecommunications Standards Institute

European organization for the standardization of digital signatures.

Filter

An optional component of a datastream specification that defines how datastream data should be decoded before it is used. Filter examples: Flate, DCT.

Font

Identifies collections of graphics that can be glyphs or other graphic elements [ISO 15930-4].

A font file defines how glyphs are displayed. If a font file is contained in a PDF file, then the associated font is embedded in the file.

If the font does not contain a complete character set but, for example, only the glyphs of the characters used in the document, the term used is subsetted font.

Function

A special type of object representing a parameterized class, including mathematical formulae and sampled representations of arbitrary resolution.

Gaussian Filter

A filter that can minimize image noise by smoothing or applying a soft-focus effect during the image editing process.

Glyph

Recognizable abstract graphic symbol, independent of any specific design [ISO/IEC 9541-1]. Glyph examples of the character “A” include: A, A, A

Graphic state

The uppermost element of a memory stack contains the parameters that control graphic representation. The graphic state contains information such as color, font, font size, current transformation matrix, etc.

Hash

A hash value (hash for short) is a number that is calculated from any quantity of data such as documents, certificates, messages, etc. This number is often much shorter than the original data (a few bytes). The hash value has the characteristic that it is the same for the same data and is almost certainly unique for different data. The original data can also not be determined from the hash value. For the calculation, hash algorithms are used such as SHA-1 or SHA-2.

Hinting

Hinting is a method that improves the display quality of fonts by optimizing the outlines when displaying the characters.

HSM - Hardware Security Module

Device for securely storing private keys and also for encryption, decryption, or creation of digital signatures and efficient and secure implementation of encryption and signature algorithms.

ICC Profile

Color profile compliant with the ICC specification [ISO 15076-1:2005].

Indirect object

An object designated by a positive integer object followed by a non-negative integer generation number followed by obj and ending with endobj.

Integer object

Mathematical integer implemented so that 0 forms the center of the interval. The number can have one or more digits and an optional sign.

Interpolation

A method that controls the combination of pixel density and color depth in raster images during editing. Bilinear interpolation is an extension of linear interpolation for scaling and displaying textures in rendered images.

ISO - International Standards Organisation

International organization for the standardization of PDF and PDF/A, etc. Switzerland is represented in the ISO by the Swiss Standards Body (SNV).

ISO 19005

See PDF/A

ISO 32000

See PDF.

ISO/IEC 18014

ISO Standard for time stamping services

ITU-T - ITU Telecommunication Standardization Sector

Coordinates standards for telecommunications and is one of three sectors of the ITU (International Telecommunication Union).

Key

Data used to encrypt / decript a message. In a public key cryptosystem, there exists a pair of a private and public key.

LTV - Long-Term Validation

Enhancement of digital signatures with additional data so that long-term verifiability is possible without online services. The additional data consist of the trust chain of the certificates from the owner certificate up to the root certificate of the issuer and also information that certifies the validity of the certificates at the time of signature.

MDP - Modification Detection and Prevention Signature

Enable detection of disallowed changes specified by the author. A document can contain only one MDP signature, which must be the first in the document. Other types of signatures may be present.

Multiple Master Fonts

Variant of the PostScript Type 1 format, which allows for all conceivable display variations of a specific font. Other elements such as line thickness and proportions can be adjusted alongside the common specifications.

Name object

An atomic symbol uniquely defined by a sequence of characters beginning with a forward slash (/, U+002F), whereby the forward slash is not part of the name.

Name tree

Similar to a dictionary that associates keys and values, whereby the keys in a name tree are strings and are ordered.

NIST - National Institute of Standards and Technology

United States Federal Agency is responsible for standardization processes.

Null object

A singular object of type null, designated by the keyword null, whose type and value are different to every other object.

Number tree

Similar to a dictionary that associates keys and values, whereby the keys in a number tree are strings and are ordered.

Numeric object

Either an integer object or a real object.

OASIS/DSS - Organization for the Advancement of Structured Information Standards /Digital Signing Services

A standard of the OASIS organization for signing services based on the XML syntax.

Object

A basic data structure used to construct PDF files. An object can be of the following types: array, Boolean, dictionary, integer, null, real, datastream or string.

Object reference

An object value that allows one object to be referenced with another. It has the form “<n> <m> R”, where <n> is an indirect object number, <m> is its version number and R is the uppercase letter R.

Object stream

A datastream containing a sequence of PDF objects.

OCR

Optical character recognition (optical character reader, OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into text, whether from a scanned document or a photo of a document.

OCSP - Online Certificate Status Protocol

Protocol for the online query of the validity status of a specific certificate based on the ASN.1 syntax.

PAdES - PDF Advanced Electronic Signature Profiles

An ETSI Standard for the structure of CMS signatures and their embedding in PDF documents.

PDF

A file format standardized by ISO (ISO-32000) for document exchange. For frequent PDF applications, there are special sub-standards such as PDF/A (ISO-19005) for archiving digital documents.

PDF/A

Portable Document Format file format for archiving, defined in [ISO 19005]. Describes the requirements that PDF documents must fulfill to comply with the standards PDF/A-1a and PDF/A-1b. The basic requirements of PDF/A-1b are:

  • Conformity with PDF Version 1.4

  • Embedding of all fonts used for visible text

  • Embedding of color profiles if specified by the color space used

  • No encryption

  • No transparency

The following applies additionally to PDF/A-1a:

  • Encoding text as UNICODE

  • Structural information must exist (tagging)

PIN - Personal Identification Number

Secret code needed for the access to a token.

PKCS - Public Key Cryptography Standards

A series of proprietary standards of RSA Security Incorporated. The most common standards are: encryption of signatures (PKCS#1), message format for signatures (PKCS#7), interface to token (PKCS#11), and file format for keys and certificates (PKCS#12).

PKI - Public Key Infrastructure

System that creates, stores, and verifies a pair of a private and a public key.

QES

Qualified Electronic Signature

Real object

Approximate mathematical real numbers but with limited range and precision and written as one or more digits with an optional sign and optional decimal point.

Rectangle

A specific array object that defines the position and bounding boxes on a page for various objects. It is represented as an array of 4 numbers designating the coordinate pairs of two diagonally opposed corners, usually in the form [bottom left X, Y, top right X, Y].

Resource dictionary

Associates resources with names, uses the objects in content datastreams with the resource objects themselves and organizes them in various categories (e.g. font, color space, pattern).

Signature, signing

Data with which the integrity and, optionally, the authenticity of a document can be ensured. The signature is essentially made as follows: the hash value is formed from the data to be signed and encrypted with the private key. The signature is packed into a CMS message together with certificates and other information.

Space character, white-space character

Text character used to represent an orthographic white space. Includes the following characters:

  • HORIZONTAL TABULATION (U+0009)

  • LINE FEED (U+000A)

  • VERTICAL TABULATION (U+000B)

  • FORM FEED (U+000C)

  • CARRIAGE RETURN (U+000D)

  • SPACE (U+0020)

  • NOBREAK SPACE (U+00A0)

  • EN SPACE (U+2002)

  • EM SPACE (U+2003)

  • FIGURE SPACE (U+2007)

  • PUNCTUATION SPACE (U+2008)

  • THIN SPACE (U+2009)

  • HAIR SPACE (U+200A)

  • ZERO WIDTH SPACE (U+200B)

  • IDEOGRAPHIC SPACE (U+3000)

Stream object

Consists of a dictionary followed by zero or more bytes parenthesized by the keywords stream and endstream.

String object

Consists of a series of bytes (unsigned integer values ranging from 0 to 255). The bytes are not integer objects but are stored in a more compact form.

TLS - Transport Layer Security

Further development of Secure Sockets Layer (SSL), a hybrid encryption protocol for secure data transmission on the internet.

Token

A “container” (part of the HSM, USB stick, smartcard, etc.) that contains private keys and protects against unauthorized access. For practical reasons, the token often also contains corresponding certificates and public keys, which do not need to be protected.

Transparency

In a PDF, graphic objects are applied onto a page in sequence, where each object is composited with the already present background. Initially, this background is only the empty page and in later steps it consist of all the composited objects added so far. In addition to the objects, a page defines a mode of compositing for each object. Depending on this mode, the underlying background either blends transparently with the new object, or it is covered opaquely. In general, the presence or absence of transparency on a PDF page cannot easily be detected by hand. But certain transparency isn’t allowed when working with PDF/A-1 formats, so converting a PDF with transparency to a PDF/A-1 can cause visual differences. The standards PDF/A-2, A-3 and A-4 on the other hand allow for transparency.

TSA - Time Stamp Authority

Accredited provider of time stamp services.

TSP - Time Stamp Protocol

Protocol for the online retrieval of cryptographic time stamps based on the ASN.1 syntax.

Unicode

International standard assigning a unique value to every meaningful font character or text element. The Universal Character Set [ISO 10646] is practically equivalent to all extents and purposes.

Verification, verifying

Validity check of a digital signature. A signature is verified as follows: the signature is decrypted with the public key. The hash value contained in the signature message is compared with the hash value calculated from the signed data. If the hashes match then the signature is valid.

Version

Designates the PDF reference used to generate the document. The processing PDF software must support this version to guarantee correct processing. PDF versions range from 1.0 to 1.8 (as per 2009). PDF 1.4 corresponds to Acrobat 5, PDF 1.8 corresponds to Acrobat 9.

Web capture

Designates the process of generating PDF content by importing and possibly converting files from the Internet or local files. The files can be imported in any format such as HTML, GIF, JPEG, text, and PDF.

WebAssembly

WebAssembly (often abbreviated to "Wasm") is a portable data format for binary code that can be executed in a suitable runtime environment, for example in a web browser. Unlike JavaScript, the code is in highly optimized binary form that is close to the hardware, which provides a significant performance advantage. The W3C (World Wide Web Consortium) launched the standard in 2017 with the goal of abstracting, optimizing, and more broadly supporting its predecessor technology, asm.js. Since WebAssembly is a compilation target, different programming languages can be used.

X.509

ITU-T Standard for a public key infrastructure to create digital certificates based on the ASN.1 syntax.

X.690

ITU-T Standard for encoding digital messages based on the ASN.1 syntax: Basic Encoding Rules (BER), Canonical Encoding Rules (CER), and Distinguished Encoding Rules (DER).

XAdES - XML Advanced Electronic Signatures

An ETSI Standard for the creation of signatures and their embedding in XML data.

XML - Extensible Markup Language

Format for the exchange of hierarchically structured data in text form between machines.

XMP packet

Structured wrapper for serialized XML metadata that can be embedded in various file formats.