Version: Version 1.3

Extract

The Toolbox add-on lets you extract information such as text, images, and signatures from a PDF document. You can also extract document attributes like the conformance level, whether the document is encrypted or protected, and metadata like author, title, and creation date.

Extract text

Learn how to extract text content from a PDF document using the Extract all text from PDF (C, C#, Java, Python) code example. This project also illustrates the use of heuristics to assemble text content into words and sentences based on their position on the page.

Quick start

Get the full sample on GitHub: C# and Java.

Extract images

Learn how to extract images from a PDF document using the Extract all images and image masks from a PDF (C, C#, Java, Python) code example. The extract images functionality accepts an image embedded as a content element in a PDF file and outputs it as an image file.

Output formats

BMP
JPEG
JPEG2000
JBIG2
PNG
GIF
TIFF

Quick start

Get the full sample on GitHub: C#, Java, and C.

Extract signatures

Learn how to extract signature content from a PDF document using the List Signatures in PDF (C, C#, Java, Python) code example. You can automatically extract signature information such as name, date, and contact information.

For a guide to comprehensive validation of digital signatures, review Validate signatures in a signed PDF document page.

Quick start

Get the full sample on GitHub: C#, Java, and C.

Extract document attributes and metadata

You can learn how to extract document attributes and metadata from a PDF document using our List document information of PDF (C, C#, Java, Python) code example.

Quick start

Get the full sample on GitHub: C#, Java, and C.

Extract text​

Extract images​

Extract signatures​

Extract document attributes and metadata​

Extract text

Extract images

Extract signatures

Extract document attributes and metadata