Unlock Your Documents for Downstream Processing with XML

Digital processes and pipelines need structured data. Convert unstructured PDFs to XML to extract data while preserving layout, reading order, and semantic tags from the source.

Output structured data

Conversion Service

Convert any file to XML

Turn documents into structured data by converting them to XML. Structured data can be ingested into internal enterprise systems, used in AI and Machine Learning contexts, and more.

  • Extract details for claims processing

  • Prepare data for Retrieval-Augmented Generation (RAG)

  • Annotate or create training sets for LLMs

A workflow for converting 50+ file types to XML is available in the Pdftools Conversion Service

Convert any file to XML
BAYER logo

PostFinance logo

SwissLife logo

SUVA logo

UBS logo

What is retrieval-augmented generation?

Retrieval-augmented generation (RAG) gives LLMs access to external data at query time, but only if that data is structured

RAG needs structured data

RAG needs structured data

LLMs can only tap into their training data, which is static and limits their ability to generate up-to-date responses. RAG adds the capability to dynamically retrieve information from additional sources at query time, but that information has to be structured.

XML preserves structure

XML preserves structure

The majority of business-related documents are still stored as PDFs, which are inherently unstructured. It’s all data that isn’t available to LLMs. By converting PDFs into XML, we can turn unstructured data into structured data that RAG can access.

Tap into your documents with structured XML data output

The advantages of converting PDF to XML

XML makes data retrieval precise: instead of combing chunks of text for information, LLMs can pull specific data fields that are normalised across documents. Converting PDFs to XML preserves layout, reading order, and semantic tags.

Insurance

Insurance

Provide policy documents, claims, and rules in XML format so LLMs can accurately answer questions about deductibles and other topics

Banking

Banking

Convert transaction records, loan agreements, and reports to pull information for compliance or customer service queries

Other industries

Other industries

Data in XML format is helpful when questions have to be answered based on a combination of general rules and data from specific cases

Get more out of your documents with XML

Book a demo call to see our XML workflow and more in action

What customers are saying

Government

BRZ

PDF/A and searchability at the Federal Ministry of Justice with the Pdftools SDK

Banking

UBS

The world's first electronic website archiving system in compliance with the ISO PDF/A standard from the Conversion Service

Insurance

SwissLife

Swiss Life archives documents from Microsoft SharePoint in PDF/A format with the Conversion Service

Healthcare

Storz Medical

The PDF Web Viewer brings new impetus to shock wave technology