Create an accessible PDF from scratch
Using the Toolbox add-on, create fully accessible, tagged PDF documents from scratch. This guide walks you through the technical steps to create a valid PDF/UA document.
A properly structured PDF is crucial not only for users with disabilities who rely on assistive technologies but also improves the experience for all users by enabling features like content reflow on mobile devices and improving machine-readability.
This functionality is part of the Toolbox add-on, a separate SDK that you can use with the same license key as the Pdftools SDK. To use and integrate this add-on, review Getting started with the Toolbox add-on and Toolbox add-on code samples.
For background on PDF accessibility concepts and the importance of logical structure, review A primer on PDF accessibility.
The process involves creating a document, defining its logical structure, and then adding content elements that are linked to that structure. Steps to create a tagged PDF document:
You need to initialize the library.
1. Create the document and structure tree
You always start with an empty output Document
.
- .NET
- Java
- Python
// Create a PDF document
using Stream outStream = new FileStream(outPath, FileMode.Create, FileAccess.ReadWrite);
using Document outDoc = Document.Create(outStream, null, null);
// Create a font
Font font = CreateFontWithFallbacks(outDoc, ARIAL_AND_FALLBACKS);
// Create a page
Size pageSize = new Size { Width = ToPoints(21, "cm"), Height = ToPoints(29.7, "cm") }; // DIN A4
Page outPage = Page.Create(outDoc, pageSize);
// Generate the page's content
CreateAndTagContent(outDoc, outPage, imagePath, font);
outDoc.Pages.Add(outPage);
// Create a PDF document
try (FileStream outStream = new FileStream(outPath, FileStream.Mode.READ_WRITE_NEW);
Document outDoc = Document.create(outStream, Conformance.PDF17, null)) {
// Create a font
Font font = createFontWithFallbacks(outDoc, ARIAL_AND_FALLBACKS);
// Create a page
Size pageSize = new Size(toPoints(21, "cm"), toPoints(29.7, "cm")); // DIN A4
Page outPage = Page.create(outDoc, pageSize);
// Generate the page's content
createAndTagContent(outDoc, outPage, imagePath, font);
outDoc.getPages().add(outPage);
# Create a PDF document
with io.FileIO(output_path, "wb+") as out_stream:
with Document.create(out_stream, Conformance.PDF17, None) as out_doc:
# Create a font
font = create_font_with_fallbacks(out_doc, ARIAL_AND_FALLBACKS)
# Create a page
page_size = Size(to_points(21, "cm"), to_points(29.7, "cm")) # DIN A4
out_page = Page.create(out_doc, page_size)
# Generate the page's content
create_and_tag_content(out_doc, out_page, image_path, font)
out_doc.pages.append(out_page)
Helper functions like createFontWithFallbacks
or toPoints
and constants like ARIAL_AND_FALLBACKS
can be viewed in the full example.
The first step is to establish the logical structure tree. This is essential because content can only be tagged if a structure node already exists for it to reference.
The structure tree begins with a Tree
object, whose root is the DocumentNode
. All other structure elements, like sections (Sect
) and paragraphs (P
), become children of this root.
- .NET
- Java
- Python
private static void CreateAndTagContent(Document outputDoc, Page outPage, string imagePath, Font font)
{
using (ContentGenerator gen = new ContentGenerator(outPage.Content, false))
{
// Create an empty logical structure tree and add a section to the root node (DocumentNode)
Tree structTree = new Tree(outputDoc);
Node docNode = structTree.DocumentNode;
Node sectionNode = new Node("Sect", outputDoc, outPage);
docNode.Children.Add(sectionNode);
// Start from the top of the page with margin
double currentY = outPage.Size.Height - MARGIN;
// Create header
currentY = CreateAndTagText(
outputDoc, outPage, gen, sectionNode, font, currentY,
"H1", "This is a properly tagged heading", 24.0);
// Add padding and create paragraph
currentY -= PADDING;
currentY = CreateAndTagText(
outputDoc, outPage, gen, sectionNode, font, currentY,
"P", "This is a properly tagged paragraph. Both heading and paragraph belong to a section.", 12.0);
// Add padding and create image
currentY -= PADDING;
CreateAndTagImage(outputDoc, outPage, gen, imagePath, currentY);
}
}
private static void createAndTagContent(Document outDoc, Page outPage, String imagePath, Font font) throws Exception {
try (ContentGenerator gen = new ContentGenerator(outPage.getContent(), false)) {
// Create an empty logical structure tree and add a section to the root node (DocumentNode)
Tree structTree = new Tree(outDoc);
Node docNode = structTree.getDocumentNode();
Node sectionNode = new Node("Sect", outDoc, outPage);
docNode.getChildren().add(sectionNode);
// Start from the top of the page with margin
double currentY = outPage.getSize().getHeight() - MARGIN;
// Create header
currentY = createAndTagText(
outDoc, outPage, gen, sectionNode, font, currentY,
"H1", "This is a properly tagged heading", 24.0);
// Add padding and create paragraph
currentY -= PADDING;
currentY = createAndTagText(
outDoc, outPage, gen, sectionNode, font, currentY,
"P", "This is a properly tagged paragraph. Both heading and paragraph belong to a section.", 12.0);
// Add padding and create image
currentY -= PADDING;
createAndTagImage(outDoc, outPage, gen, imagePath, currentY);
}
}
def create_and_tag_content(out_doc: Document, out_page: Page, image_path: str, font: Font):
with ContentGenerator(out_page.content, False) as gen:
# Create an empty logical structure tree and add a section to the root node (DocumentNode)
struct_tree = Tree(out_doc)
doc_node = struct_tree.document_node
section_node = Node("Sect", out_doc, out_page)
doc_node.children.append(section_node)
# Start from the top of the page with margin
current_y = out_page.size.height - MARGIN
# Create header
current_y = create_and_tag_text(
out_doc, out_page, gen, section_node, font, current_y,
"H1", "This is a properly tagged heading", 24.0)
# Add padding and create paragraph
current_y -= PADDING
current_y = create_and_tag_text(
out_doc, out_page, gen, section_node, font, current_y,
"P", "This is a properly tagged paragraph. Both heading and paragraph belong to a section.", 12.0)
# Add padding and create image
current_y -= PADDING
create_and_tag_image(out_doc, out_page, gen, image_path, current_y)
2. Add and tag content
With the structure tree in place, you can create page content and associate it with a specific structure node. This is done using a ContentGenerator
and its TagAs
method. The workflow is:
- Start tagging by calling
gen.TagAs(node)
. - Add the content (e.g., paint text or an image).
- Stop tagging with
gen.StopTagging()
.
2.1. Tagging text
When tagging text elements like headings or paragraphs, the ActualText
property of the Node
should be set. This provides an explicit text equivalent for screen readers.
- .NET
- Java
- Python
private static double CreateAndTagText(
Document outputDoc, Page outPage, ContentGenerator gen, Node sectionNode,
Font font, double topY, string tagName, string textContent, double fontSize)
{
// Create a node in the logical structure tree
// that will contain the text and add it to the section
Node textNode = new Node(tagName, outputDoc, outPage);
textNode.ActualText = textContent;
sectionNode.Children.Add(textNode);
// Calculate text baseline position
double baselineY = topY - fontSize * font.Ascent;
// From here on, all painted graphics will be contained within that node
gen.TagAs(textNode);
// Paint text onto the page
Text text = Text.Create(outputDoc);
using (TextGenerator textGen = new TextGenerator(text, font, fontSize, null))
{
Point position = new Point { X = MARGIN, Y = baselineY };
textGen.MoveTo(position);
textGen.ShowLine(textNode.ActualText);
}
gen.PaintText(text);
gen.StopTagging();
// Return bottom coordinate (baseline - descent)
return baselineY - fontSize * font.Descent;
}
private static double createAndTagText(
Document outDoc, Page outPage, ContentGenerator gen, Node sectionNode,
Font font, double topY, String tagName, String textContent, double fontSize) throws Exception {
// Create a node in the logical structure tree
// that will contain the text and add it to the section
Node textNode = new Node(tagName, outDoc, outPage);
textNode.setActualText(textContent);
sectionNode.getChildren().add(textNode);
// Calculate text baseline position
double baselineY = topY - fontSize * font.getAscent();
// From here on, all painted graphics will be contained within that node
gen.tagAs(textNode);
// Paint text onto the page
Text text = Text.create(outDoc);
try (TextGenerator textGen = new TextGenerator(text, font, fontSize, null)) {
Point position = new Point(MARGIN, baselineY);
textGen.moveTo(position);
textGen.showLine(textNode.getActualText());
}
gen.paintText(text);
gen.stopTagging();
// Return bottom coordinate (baseline - descent)
return baselineY - fontSize * font.getDescent();
}
def create_and_tag_text(
out_doc: Document, out_page: Page, gen: ContentGenerator, section_node: Node,
font: Font, top_y: float, tag_name: str, text_content: str, font_size: float) -> float:
# Create a node in the logical structure tree
# that will contain the text and add it to the section
text_node = Node(tag_name, out_doc, out_page)
text_node.actual_text = text_content
section_node.children.append(text_node)
# Calculate text baseline position
baseline_y = top_y - font_size * font.ascent
# From here on, all painted graphics will be contained within that node
gen.tag_as(text_node)
# Paint text onto the page
text = Text.create(out_doc)
with TextGenerator(text, font, font_size, None) as text_gen:
position = Point(MARGIN, baseline_y)
text_gen.move_to(position)
text_gen.show_line(text_node.actual_text)
gen.paint_text(text)
gen.stop_tagging()
# Return bottom coordinate (baseline - descent)
return baseline_y - font_size * font.descent
2.2. Tagging an image
For non-text content, providing a text alternative is critical. For images (Figure
nodes), you must set the AlternateText
property.
- .NET
- Java
- Python
private static double CreateAndTagImage(
Document outputDoc, Page outPage, ContentGenerator gen,
string imagePath, double topY)
{
// If a structure tree has already been created for the given document,
// the existing structure tree reference will be returned.
Tree tree = new Tree(outputDoc);
Node docNode = tree.DocumentNode;
Node figureNode = new Node("Figure", outputDoc, outPage);
figureNode.AlternateText = "PdfTools AG Logo";
docNode.Children.Add(figureNode);
// From here on, all painted graphics will be contained within that node
gen.TagAs(figureNode);
// Load an image
Image image;
using (Stream inImage = new FileStream(imagePath, FileMode.Open, FileAccess.Read))
{
image = Image.Create(outputDoc, inImage);
}
// Paint the image onto the page
double x = MARGIN;
double width = ToPoints(2.0, "cm");
double height = width * image.Size.Height / image.Size.Width; // preserve aspect ratio
Rectangle rect = new Rectangle
{
Left = x, // left
Bottom = topY - height, // bottom (Rectangle coordinates: bottom is lower than top)
Right = x + width, // right
Top = topY // top
};
gen.PaintImage(image, rect);
gen.StopTagging();
// Return bottom coordinate
return topY - height;
}
private static double createAndTagImage(
Document outDoc, Page outPage, ContentGenerator gen,
String imagePath, double topY) throws Exception {
// If a structure tree has already been created for the given document,
// the existing structure tree reference will be used.
Tree tree = new Tree(outDoc);
Node docNode = tree.getDocumentNode();
Node figureNode = new Node("Figure", outDoc, outPage);
figureNode.setAlternateText("PdfTools AG Logo");
docNode.getChildren().add(figureNode);
// From here on, all painted graphics will be contained within that node
gen.tagAs(figureNode);
// Load an image
Image image;
try (FileStream inImage = new FileStream(imagePath, FileStream.Mode.READ_ONLY)) {
image = Image.create(outDoc, inImage);
}
// Paint the image onto the page
double x = MARGIN;
double width = toPoints(2.0, "cm");
double height = width * image.getSize().getHeight() / image.getSize().getWidth(); // preserve aspect ratio
Rectangle rect = new Rectangle(
x, // left
topY - height, // bottom (Rectangle coordinates: bottom is lower than top)
x + width, // right
topY // top
);
gen.paintImage(image, rect);
gen.stopTagging();
// Return bottom coordinate
return topY - height;
}
def create_and_tag_image(
out_doc: Document, out_page: Page, gen: ContentGenerator,
image_path: str, top_y: float) -> float:
# If a structure tree has already been created for the given document,
# the existing structure tree reference will be used.
tree = Tree(out_doc)
doc_node = tree.document_node
figure_node = Node("Figure", out_doc, out_page)
figure_node.alternate_text = "PdfTools Logo"
doc_node.children.append(figure_node)
# From here on, all painted graphics will be contained within that node
gen.tag_as(figure_node)
# Load an image
with io.FileIO(image_path, "rb") as in_image:
image = Image.create(out_doc, in_image)
# Paint the image onto the page
x = MARGIN
width = to_points(2.0, "cm")
height = width * image.size.height / image.size.width # preserve aspect ratio
rect = Rectangle(
left=x,
bottom=top_y - height, # Rectangle coordinates: bottom is lower than top
right=x + width,
top=top_y,
)
gen.paint_image(image, rect)
gen.stop_tagging()
# Return bottom coordinate
return top_y - height
When writing alternative text, follow these best practices:
- Be descriptive but concise
- Convey the purpose of the image, not just its appearance
- Don’t start with “Image of…” or “Picture of…”
- For complex diagrams, consider providing a longer description elsewhere
3. Setting accessibility metadata
The SDK automatically sets the Tagged
flag when you create a PDF’s logical structure. However, you must explicitly set the PDF/UA flag when you’re confident your document meets all accessibility requirements:
- .NET
- Java
- Python
// Set document metadata
outDoc.Metadata.Title = "Accessible Document Example";
outDoc.Metadata.Author = "Your Organization";
// Set language for the entire document
outDoc.Language = "en-US";
// Only set PDF/UA flag after ensuring full accessibility compliance
// This requires human verification beyond technical implementation
// outDoc.SetPdfUaConformant(); // Uncomment only after full review
// Set document metadata
outDoc.getMetadata().setTitle("Accessible Document Example");
outDoc.getMetadata().setAuthor("Your Organization");
// Set language for the entire document
outDoc.setLanguage("en-US");
// Only set PDF/UA flag after ensuring full accessibility compliance
// This requires human verification beyond technical implementation
// outDoc.setPdfUaConformant(); // Uncomment only after full review
# Set document metadata
out_doc.metadata.title = "Accessible Document Example"
out_doc.metadata.author = "Your Organization"
# Set language for the entire document
out_doc.language = "en-US"
# Only set PDF/UA flag after ensuring full accessibility compliance
# This requires human verification beyond technical implementation
# out_doc.set_pdf_ua_conformant() # Uncomment only after full review
Setting the PDF/UA flag indicates that your document fully complies with the PDF/UA standard. This goes beyond technical structure and includes:
- Proper color contrast ratios
- Meaningful reading order
- Appropriate use of headings
- Complete alternative descriptions
- Correct language specifications
Always perform human review before claiming PDF/UA compliance.
Full example
Download the full example to see all the pieces working together.
Next steps
After creating accessible PDFs:
- Validate accessibility: Use PDF accessibility checkers to verify your implementation
- Test with assistive technology: Try your PDFs with screen readers like NVDA or JAWS
- Learn about remediation: See our guide on adding structure to existing PDFs
- Explore structure reading: Learn how to read and analyze logical structure
Remember: Creating truly accessible PDFs requires both technical implementation and human understanding. While our SDK provides the tools, achieving genuine accessibility requires thoughtful consideration of how people will use your documents.