Read PDF logical structure

Use the Toolbox add-on to read and traverse the logical structure of a tagged PDF document. This guide covers the technical process of accessing and analyzing existing document structure.

If you need to add logical structure to an existing PDF, see our guide on Adding logical structure to existing PDFs.

info

This functionality is part of the Toolbox add-on, a separate SDK that you can use with the same license key as the Pdftools SDK. To use and integrate this add-on, review Getting started with the Toolbox add-on and Toolbox add-on code samples.

Quick start

Download the full sample now in C#, Java, and Python.

For background on PDF accessibility concepts and the importance of logical structure, review A primer on PDF accessibility.

Reading logical structure involves accessing and traversing the document’s structure tree to extract information about tagged elements. Steps to read PDF logical structure:

Opening the tagged document
Accessing the structure tree
Traversing the tree recursively
Reading node properties
Full example

Before you begin

You need to initialize the library.

Opening the tagged document

Start by opening the PDF document that contains a logical structure. Only tagged PDFs include accessible structure information.

You can then check whether the PDF claims to be PDF/UA conformant using the is_pdf_ua_conformant property.

.NET
Java
Python

// Open input document
using Stream inStream = new FileStream(inPath, FileMode.Open, FileAccess.Read);
using Document inDoc = Document.Open(inStream, null);

if (inDoc.IsPdfUaConformant)
{
    Console.WriteLine("This PDF declares PDF/UA conformance.");
}
else
{
    Console.WriteLine("This PDF does not declare PDF/UA conformance.");
}

// Open input document
try (FileStream inStream = new FileStream(inPath, FileStream.Mode.READ_ONLY);
     Document inDoc = Document.open(inStream, null)) {

    if (inDoc.getIsPdfUaConformant()) {
        System.out.println("This PDF declares PDF/UA conformance.");
    } else {
        System.out.println("This PDF does not declare PDF/UA conformance.");
    }

# Open input document
with open(input_file_path, "rb") as in_stream:
    with Document.open(in_stream, None) as in_doc:
        if in_doc.is_pdf_ua_conformant:
            print("This PDF declares PDF/UA conformance.")
        else:
            print("This PDF does not declare PDF/UA conformance.")

PDF/UA Declaration

The isPdfUaConformant flag only reflects the PDF’s metadata declaration.
It does not guarantee actual PDF/UA compliance — use a validator to verify true conformance.

Accessing the structure tree

Create a Tree object to access the document’s logical structure. The tree provides access to the root document node and its children.

.NET
Java
Python

// Create a structure tree object
var tree = new Tree(inDoc);

// Traverse all top-level structure elements
foreach (var child in tree.Children)
{
    PrintNodeRecursively(child);
}

// Create a structure tree object
Tree tree = new Tree(inDoc);

// Traverse all top-level structure elements
for (Node node : tree.getChildren()) {
    printNodeRecursively(node, 0);
}

# Create structure tree object
tree = Tree(in_doc)

# Traverse all top-level structure elements
for node in tree.children:
    print_node_recursive(node, 0)

Traversing the tree recursively

Implement a recursive function to traverse the entire structure tree. Each node can have child nodes, creating a hierarchical structure.

.NET
Java
Python

static void PrintNodeRecursively(Node node, int level = 0)
{
    // Print current node information
    PrintProperty(level, "Tag", node.Tag);
    PrintProperty(level, "Alternative text", node.AlternateText);
    PrintProperty(level, "Actual text", node.ActualText);
    PrintProperty(level, "Language", node.Language);

    // Recursively traverse child nodes
    foreach (var child in node.Children)
    {
        PrintNodeRecursively(child, level + 1);
    }
}

static void printNodeRecursively(Node node, int level) throws Exception {
    // Print current node information
    printProperty(level, "Tag", node.getTag());
    printProperty(level, "Alternative text", node.getAlternateText());
    printProperty(level, "Actual text", node.getActualText());
    printProperty(level, "Language", node.getLanguage());

    // Recursively traverse child nodes
    for (Node child : node.getChildren()) {
        printNodeRecursively(child, level + 1);
    }
}

def print_node_recursive(node: Node, level: int):
    # Print current node information
    print_property(level, "Tag", node.tag)
    print_property(level, "Alternative text", node.alternate_text)
    print_property(level, "Actual text", node.actual_text)
    print_property(level, "Language", node.language)

    # Recursively traverse child nodes
    for child in node.children:
        print_node_recursive(child, level + 1)

Reading node properties

Each structure node contains various properties that provide information about the tagged element:

Tag: The structure type (e.g., “H1”, “P”, “Figure”, “Table”)
Actual Text: The text content for text elements
Alternative Text: Alternative text for images and non-text elements
Language: Language specification for the element
Abbreviation: Expanded form of abbreviations

.NET
Java
Python

static void PrintProperty(int level, String name, String value)
{
    Console.Write($"{new string(' ', level * 2)}");
    Console.WriteLine($"{name}: '{value}'");
}

static void printProperty(int level, String name, String value) {
    for (int i = 0; i < level; ++i) {
        System.out.print("  ");
    }
    System.out.println(name + ": '" + value + "'");
}

def print_property(level: int, label: str, value):
    indent = "  " * level
    value_str = str(value or "")
    print(f"{indent}{label}: '{value_str}'")

Example output

When you run the structure traversal, the output is similar to the following:

This PDF declares PDF/UA conformance.
Tag: 'Document'
Alternative text: ''
Actual text: ''
Language: ''
  Tag: 'Title'
  Alternative text: ''
  Actual text: ''
  Language: ''
  Tag: 'Text body'
  Alternative text: ''
  Actual text: ''
  Language: ''
  Tag: 'Text body'
  Alternative text: ''
  Actual text: ''
  Language: ''
    Tag: 'Figure'
    Alternative text: 'A test image of a document icon'
    Actual text: ''
    Language: ''

Full example

Download the complete structure traversal example:

Use cases

Reading logical structure is useful for:

Accessibility auditing: Verify that documents have proper structure
Content extraction: Extract structured content while preserving hierarchy
Document analysis: Understand document organization and reading order
Quality assurance: Validate that remediation or creation processes worked correctly

Reading logical structure provides insights into how assistive technologies will interpret your PDF documents, making it an essential tool for accessibility validation.

Opening the tagged document​

Accessing the structure tree​

Traversing the tree recursively​

Reading node properties​

Example output​

Full example​

Use cases​