Skip to main content
Version: 4.4

About the PDF Toolbox SDK

The PDF Toolbox SDK is a native library with interfaces for .NET, Java, and C for creating, extracting, assembling, and modifying PDF documents.


Key features

Document generation

The PDF Toolbox SDK lets you create PDF files, defining structure at document level and content at page level.

Create PDF document definition

You can create a new document from scratch, determining the pages in the document tree, adding form fields, and adding outline items.

  • Create pages
  • Create form fields:
    • General text fields and comb text fields
    • Check boxes
    • Radio button groups
    • List boxes
    • Combo boxes
  • Create new outline items and insert them at any position in the tree
  • Destinations: Named and direct destinations in the same document
  • Configure viewer settings

Create content at page level

  • Create new PDF content from scratch
  • Apply content to existing pages

Determine colors

  • Device colors: RGB, CMYK, and grayscale
  • ICC color profiles
  • Transparency: alpha and blend mode

Add trace paths

  • Single and multi-segment lines
  • Rectangle, circle, Bezier curves, ellipse, arc, pie
  • Filling, stroking, clipping, and combinations thereof
  • Line width, cap, join, dash array, dash phase, and miter limit
  • Inside rule: nonzero winding rule, even/odd rule

Add text

  • Font size, character spacing, word spacing
  • Horizontal scaling, leading, rise
  • Enables simple text layouting
  • Standard PDF fonts, installed fonts
  • Font metrics: italic angle, ascent, descent, cap height, character width
  • Unicode characters
  • Text stroke: line width, line join, and dashes
  • Fill and stroke text, invisible text
  • Use text as clipping path

Add images

  • Bi-level: CCITT G3, G3 2D and G4, Flate, LZW, Packbits, uncompressed
  • 4 bit and 8 bit grayscale: Flate, LZW, Packbits, JPEG and JPEG-6 (8 bit only), uncompressed
  • RGB: Flate, JPEG and JPEG-6, LZW, Packbits, uncompressed

Add transformations

  • Translation
  • Scaling
  • Skewing (horizontal, vertical)
  • Rotation

You can add multiple annotation types such as text, stamps, drawings, and text revision marks to the PDF pages. You can add internal links such as section references and external links to specific web pages or embedded files.

  • Document-internal links
  • Web links
  • Links to embedded PDFs
  • File attachment annotations
  • Free text annotation
  • Sticky note annotation
  • Text stamp annotation
  • Custom stamp annotation
  • Circle annotation
  • Square annotation
  • Line annotation
  • Poly line annotation
  • Polygon annotation
  • Ink annotation (pen drawing)
  • Highlight annotation
  • Underline and squiggly underline annotation
  • Strike-through annotation

Document modification

The PDF Toolbox SDK lets you edit PDF files, deleting objects, adding markup annotations, deleting or changing field values.

Edit page content

  • Selectively delete content elements (without tagging and layers)
  • Transform content elements geographically (without tagging and layers)

Edit annotations

  • Web link annotation target URIs
  • Markup annotation location, creation and modification date, subject, author, content

Edit form fields

  • Delete fields and modify field values for:
    • General text fields and comb text fields
    • Check boxes
    • Radio button groups
    • List boxes
    • Combo boxes

Document extraction

The PDF Toolkit SDK lets you extract data and text from PDF files.

Extract data from document and page level

You can extract specific data from PDF documents, either at document or page level. Information you can extract includes metadata, page content, encryption settings, and information about embedded files.

  • Document information entries: title, author, subject, keywords, creator, producer, creation date, modification date
  • Document XMP metadata
  • Document encryption settings
  • Embedded files
  • Page bounding boxes: media box, crop box, bleed box, trim box, art box
  • Page XMP metadata
  • Outline item tree: Tree structure, item title, expanded/collapsed
  • Destinations: Named and direct destinations in the same document
  • Viewer settings

Extract content

  • Page and group content elements, including:
    • Bounding box
    • Affine transformation As either of the following:
      • Group element
      • Image element
    • Width and height in pixel
    • Bits per component
    • Color space
  • Image mask element
    • Width and height in pixel
    • Paint for filling the mask
  • Path element
    • Alignment box
    • Subpaths and subpath segments
    • Fill parameters including paint and fill rule
    • Stroke parameters including line paint and line style
  • Shading element
  • Text element
    • Text fragments
      • Bounding box
      • Affine transformation
      • Unicode string
      • Fill parameters, including paint and fill rule
      • Stroke parameters, including line paint and line style
      • Font size, character spacing, word spacing, horizontal scaling, and text rise

Extract annotations

  • Annotations: location
  • Markup annotation: type, location, creation/modification date, subject, author, content
  • Custom stamp annotations: appearance
  • Text markup annotations: markup area
  • Link annotations: location, target destination or URI, active link area
  • Signature fields: name, location, reason, contact info, date, visibility

Extract AcroForm form fields

  • Form field identifiers, export names, and user names, including form field hierarchy
  • Form field export and display content of:
    • Push buttons
    • Check boxes
    • Radio button groups
    • General text fields and comb text fields
    • List boxes
    • Combo boxess

Document assembly

The PDF Toolbox SDK lets you assemble PDF files from existing PDF files.

Assemble PDF files

  • Copy pages from existing PDFs
  • Copy annotations, form fields, links, logical structure, destinations, outlines, and layers
  • Flatten annotations, form fields, and signatures
  • Optimize resources
  • Crop and rotate pages
  • Compose content: overlays, underlays, stamps, transformations
  • Add encryption: user password, owner password, permissions
  • Copy and modify document metadata
  • Copy and modify page metadata
  • Add embedded files and associated files
  • Get and set OpenAction destination
  • Merge a PDF and an FDF
  • Separate markup annotations into an FDF

Document model

The document model of the PDF Toolbox SDK consists of two different types of objects:
  • Structure objects: define the structure of the document. These objects include Document, Page, and Content.
  • Graphics resources: used to draw content with a ContentGenerator. Examples are Image, Font, and ColorSpace.

All objects in the document model are bound to a specific document. They can only be used in the context of the document for which they were created. The objects of the document model are all stateless. Where a stateful interface is required, it is provided by an external generator or extractor, which is not considered part of the document model.

The PDF Toolbox SDK does not allow in-place modification of documents. Instead, the content is copied into a new document, while performing the necessary changes.

To copy objects from a source document into a target document, you call the object’s static Copy method with the target document as first argument. This means you can process very large files without consuming too much memory. The content of the input document is only read on demand and any modifications can be directly stored in the output file.

To provide a uniform interface, operations are divided into two steps:

  1. Create (or copy) the object
  2. Use the object

This separation means there are multiple variants for both steps, without having a “combinatorical explosion” of methods.

Create the object

The object is created in the target document or copied from the source document to the target document. After creation, the object is associated with the document, but not yet used. This means that copying or creating an object may change the size of the target file. However, logically, the PDF is still unchanged.

For example, these methods can be used to create an object:

  • Page.Create
  • Font.Create
  • Page.Copy
  • PageList.Copy
  • ColorSpace.Copy
  • Metadata.Copy
  • ContentElement.Copy

Use the object

The associated object can then be used in the target document. This second step is often more lightweight than the first step, since all the necessary copying of objects is already done.

For example, these methods of the ContentGenerator generator object are used:

  • PaintImage
  • PaintGroup
  • AppendContentElement or the PageList.Add method.

Generator objects

Some objects in a PDF consist of a list or stream of operations that operate on an internal state:

  • Content streams
  • Text objects
  • Path objects

Since all data objects in the PDF Toolbox SDK are stateless, a (simplified) stateful interface is provided by the generator interfaces:

  • Content objects can be modified with a ContentGenerator.
  • Path objects can be modified with a PathGenerator.
  • Text objects can be modified with a TextGenerator.

Generator objects must always be closed explicitly before the generated object can be used.

Garbage collection and closing objects

Every interface object is considered to be a resource that needs to be closed after use. Most objects are closed automatically, at the latest when the owning document is closed (in C# and Java), possibly earlier by the garbage collector.

In addition to Document objects, the generator objects ContentGenerator, PathGenerator, and TextGenerator must be closed. Otherwise, the generated objects are incomplete.

Thread safety

The PDF Toolbox SDK is generally thread-­safe. However, a document may only be accessed in one thread concurrently, including all sub­objects. Almost all objects are directly or indirectly associated with a document.

note

Methods that copy from a source to a target document have to access both documents.

The thread safety rules apply both to the target document and the source document. This means that copying from the same source document concurrently is not allowed.

Garbage collection and finalizer

Object finalization is thread-­safe.

caution

The finalizer of the Document is not thread-­safe regarding access to its subobjects.

Subobjects do not retain their associated document object. If all references to an open document go out of scope, the document finalizer is eventually run and the document is closed.

danger

Explicitly accessing (or even closing) any subobject while the document finalizer is running is not safe!

PDF graphics model

Coordinate system

PDF coordinates are measured from bottom to top, unlike many other coordinate systems used in IT.

For the sake of simplicity, all coordinates used in the PDF Toolbox SDK are normalized such that the point (0,0) denotes the lower left corner of the visible page (crop box). The internal Rotate attribute of a PDF page is not exposed at the API. Instead, all coordinates are assumed to refer to the already rotated page.

Affine transformations

Affine transformations can be used to rotate, move, scale, or otherwise, skew any page content.

Transformations always affect the coordinate system as a whole. All following graphics operations are executed in the transformed coordinate system, including additional transformations.

This means that the ordering of how transformations are applied is important.