About the PDF Toolbox SDK
The PDF Toolbox SDK is a native library with interfaces for .NET, Java, and C for creating, extracting, assembling, and modifying PDF documents.
Key features
Document generation
The PDF Toolbox SDK lets you create PDF files, defining structure at document level and content at page level.
Create PDF document definition
You can create a new document from scratch, determining the pages in the document tree, adding form fields, and adding outline items.
- Create pages
- Create form fields:
- General text fields and comb text fields
- Check boxes
- Radio button groups
- List boxes
- Combo boxes
- Create new outline items and insert them at any position in the tree
- Destinations: Named and direct destinations in the same document
- Configure viewer settings
Create content at page level
- Create new PDF content from scratch
- Apply content to existing pages
Determine colors
- Device colors: RGB, CMYK, and grayscale
- ICC color profiles
- Transparency: alpha and blend mode
Add trace paths
- Single and multi-segment lines
- Rectangle, circle, Bezier curves, ellipse, arc, pie
- Filling, stroking, clipping, and combinations thereof
- Line width, cap, join, dash array, dash phase, and miter limit
- Inside rule: nonzero winding rule, even/odd rule
Add text
- Font size, character spacing, word spacing
- Horizontal scaling, leading, rise
- Enables simple text layouting
- Standard PDF fonts, installed fonts
- Font metrics: italic angle, ascent, descent, cap height, character width
- Unicode characters
- Text stroke: line width, line join, and dashes
- Fill and stroke text, invisible text
- Use text as clipping path
Add images
- Bi-level: CCITT G3, G3 2D and G4, Flate, LZW, Packbits, uncompressed
- 4 bit and 8 bit grayscale: Flate, LZW, Packbits, JPEG and JPEG-6 (8 bit only), uncompressed
- RGB: Flate, JPEG and JPEG-6, LZW, Packbits, uncompressed
Add transformations
- Translation
- Scaling
- Skewing (horizontal, vertical)
- Rotation
Create annotations and links
You can add multiple annotation types such as text, stamps, drawings, and text revision marks to the PDF pages. You can add internal links such as section references and external links to specific web pages or embedded files.
- Document-internal links
- Web links
- Links to embedded PDFs
- File attachment annotations
- Free text annotation
- Sticky note annotation
- Text stamp annotation
- Custom stamp annotation
- Circle annotation
- Square annotation
- Line annotation
- Poly line annotation
- Polygon annotation
- Ink annotation (pen drawing)
- Highlight annotation
- Underline and squiggly underline annotation
- Strike-through annotation
Document modification
The PDF Toolbox SDK lets you edit PDF files, deleting objects, adding markup annotations, deleting or changing field values.
Edit page content
- Selectively delete content elements (without tagging and layers)
- Transform content elements geographically (without tagging and layers)
Edit annotations
- Web link annotation target URIs
- Markup annotation location, creation and modification date, subject, author, content
Edit form fields
- Delete fields and modify field values for:
- General text fields and comb text fields
- Check boxes
- Radio button groups
- List boxes
- Combo boxes
Document extraction
The PDF Toolkit SDK lets you extract data and text from PDF files.
Extract data from document and page level
You can extract specific data from PDF documents, either at document or page level. Information you can extract includes metadata, page content, encryption settings, and information about embedded files.
- Document information entries: title, author, subject, keywords, creator, producer, creation date, modification date
- Document XMP metadata
- Document encryption settings
- Embedded files
- Page bounding boxes: media box, crop box, bleed box, trim box, art box
- Page XMP metadata
- Outline item tree: Tree structure, item title, expanded/collapsed
- Destinations: Named and direct destinations in the same document
- Viewer settings
Extract content
- Page and group content elements, including:
- Bounding box
- Affine transformation
As either of the following:
- Group element
- Image element
- Width and height in pixel
- Bits per component
- Color space
- Image mask element
- Width and height in pixel
- Paint for filling the mask
- Path element
- Alignment box
- Subpaths and subpath segments
- Fill parameters including paint and fill rule
- Stroke parameters including line paint and line style
- Shading element
- Text element
- Text fragments
- Bounding box
- Affine transformation
- Unicode string
- Fill parameters, including paint and fill rule
- Stroke parameters, including line paint and line style
- Font size, character spacing, word spacing, horizontal scaling, and text rise
- Text fragments
Extract annotations
- Annotations: location
- Markup annotation: type, location, creation/modification date, subject, author, content
- Custom stamp annotations: appearance
- Text markup annotations: markup area
- Link annotations: location, target destination or URI, active link area
- Signature fields: name, location, reason, contact info, date, visibility
Extract AcroForm form fields
- Form field identifiers, export names, and user names, including form field hierarchy
- Form field export and display content of:
- Push buttons
- Check boxes
- Radio button groups
- General text fields and comb text fields
- List boxes
- Combo boxess
Document assembly
The PDF Toolbox SDK lets you assemble PDF files from existing PDF files.
Assemble PDF files
- Copy pages from existing PDFs
- Copy annotations, form fields, links, logical structure, destinations, outlines, and layers
- Flatten annotations, form fields, and signatures
- Optimize resources
- Crop and rotate pages
- Compose content: overlays, underlays, stamps, transformations
- Add encryption: user password, owner password, permissions
- Copy and modify document metadata
- Copy and modify page metadata
- Add embedded files and associated files
- Get and set OpenAction destination
- Merge a PDF and an FDF
- Separate markup annotations into an FDF
Document model
The document model of the PDF Toolbox SDK consists of two different types of objects:- Structure objects: define the structure of the document. These objects include
Document
,Page
, andContent
. - Graphics resources: used to draw content with a ContentGenerator. Examples are
Image
,Font
, andColorSpace
.
All objects in the document model are bound to a specific document. They can only be used in the context of the document for which they were created. The objects of the document model are all stateless. Where a stateful interface is required, it is provided by an external generator or extractor, which is not considered part of the document model.
The PDF Toolbox SDK does not allow in-place modification of documents. Instead, the content is copied into a new document, while performing the necessary changes.
To copy objects from a source document into a target document, you call the object’s static Copy
method with the target document as first argument. This means you can process very large files without consuming too much memory. The content of the input document is only read on demand and any modifications can be directly stored in the output file.
To provide a uniform interface, operations are divided into two steps:
This separation means there are multiple variants for both steps, without having a “combinatorical explosion” of methods.
Create the object
The object is created in the target document or copied from the source document to the target document. After creation, the object is associated with the document, but not yet used. This means that copying or creating an object may change the size of the target file. However, logically, the PDF is still unchanged.
For example, these methods can be used to create an object:
Page.Create
Font.Create
Page.Copy
PageList.Copy
ColorSpace.Copy
Metadata.Copy
ContentElement.Copy
Use the object
The associated object can then be used in the target document. This second step is often more lightweight than the first step, since all the necessary copying of objects is already done.
For example, these methods of the ContentGenerator
generator object are used:
PaintImage
PaintGroup
AppendContentElement
or thePageList.Add
method.
Generator objects
Some objects in a PDF consist of a list or stream of operations that operate on an internal state:
- Content streams
- Text objects
- Path objects
Since all data objects in the PDF Toolbox SDK are stateless, a (simplified) stateful interface is provided by the generator interfaces:
- Content objects can be modified with a
ContentGenerator
. - Path objects can be modified with a
PathGenerator
. - Text objects can be modified with a
TextGenerator
.
Generator objects must always be closed explicitly before the generated object can be used.
Garbage collection and closing objects
Every interface object is considered to be a resource that needs to be closed after use. Most objects are closed automatically, at the latest when the owning document is closed (in C# and Java), possibly earlier by the garbage collector.
In addition to Document
objects, the generator objects ContentGenerator
, PathGenerator
, and TextGenerator
must be closed. Otherwise, the generated objects are incomplete.
Thread safety
The PDF Toolbox SDK is generally thread-safe. However, a document may only be accessed in one thread concurrently, including all subobjects. Almost all objects are directly or indirectly associated with a document.
Methods that copy from a source to a target document have to access both documents.
The thread safety rules apply both to the target document and the source document. This means that copying from the same source document concurrently is not allowed.
Garbage collection and finalizer
Object finalization is thread-safe.
The finalizer of the Document
is not thread-safe regarding access to its subobjects.
Subobjects do not retain their associated document object. If all references to an open document go out of scope, the document finalizer is eventually run and the document is closed.
Explicitly accessing (or even closing) any subobject while the document finalizer is running is not safe!
PDF graphics model
Coordinate system
PDF coordinates are measured from bottom to top, unlike many other coordinate systems used in IT.
For the sake of simplicity, all coordinates used in the PDF Toolbox SDK are normalized such that the point (0,0) denotes the lower left corner of the visible page (crop box).
The internal Rotate
attribute of a PDF page is not exposed at the API. Instead, all coordinates are assumed to refer to the already rotated page.
Affine transformations
Affine transformations can be used to rotate, move, scale, or otherwise, skew any page content.
Transformations always affect the coordinate system as a whole. All following graphics operations are executed in the transformed coordinate system, including additional transformations.
This means that the ordering of how transformations are applied is important.