Skip to main content

Archive PDF/A-2 workflow

The Archive PDF/A-2 workflow is engineered specifically for preparing documents for archiving. The workflow is specifically tailored to the use case of document archiving.

In particular, you can:

  • Maintain the original appearance and structure of the document.
  • Minimize the loss of information.
  • Ensure traceability and reproducibility of the changes made to the document.

The Archive PDF/A-2 workflow supports these features:

  • Conversion to PDF/A-2 format (PDF/A-2b, PDF/A-2u, and PDF/A-2a)
  • Optimization of PDF/A structure (optional)
  • Office conversion (optional)
  • Optical character recognition (optional)
  • Digital signatures (optional)
Conformance levels

The Conversion Service supports Basic (PDF/A-2b), Unicode (PDF/A-2u), and accessibility (PDF/A-2a) conformance levels. All conformance levels are incremental; the subsequent conformance level includes the requirements of the previous level and other additional requirements. For example, all level U (Unicode) PDF/A-2 documents are also valid level B (Basic) documents.

The Archive PDF/A-2 workflow automatically tries to converts the input document to the highest conformance level for PDF/A-2 (usually, PDF/A-2a). If the input document is PDF/A-2b or a PDF without structure information, it tries to converts the document to PDF/A-2u. If it is unable to convert to this level, it converts to PDF/A-3b or ends the conversion process in failure.

Supported file formats for Archive PDF/A-2

The workflow supports these file formats:

Extension / Type
Document formatsPDF 1.x, PDF 2.0, PDF/A-1, PDF/A-2, PDF/A-3
Image formatsJPEG, JPEG200, TIFF, BMP, GIF, JBIG2, PNG, HEIC, HEIC
EmailEML, MSG (without encryption)
WordDOC, DOT, DOCX, DOCM, DOTX, DOTM, RFT, XML (WordprocessingML 2003)
ExcelXLS, XLT, XLSX, XLSM, XLTX, XLTM, XML (SpreadsheetML 2003)
PowerPointPPT, PPS, PPTX, PPTM, PPSX, PPSM
OpenOfficeODT, ODS, ODP
OtherCSV, HTML, HTM (prepared for archiving), TXT, XML, ZIP (without password protection)
Note on OpenOffice formats

PDF conversion of OpenDocument Format depends on the rendering in Microsoft Word, Excel or PowerPoint. In particular, visual differences may occur with tables and tabs. The visual differences caused by the rendering of shapes are usually not acceptable.

Note on HTML format

HTML documents need to be self-contained (layout information and images are either inline or available on the web) and suited for portrait page layout. JavaScript content is disabled during processing.

Note on XML format

Layout information and images need to be available on the web.

The conversion of most file formats is enabled by default in the Convert mode.

Configuring the workflow

The workflow's profile offers a fine-grained configuration of how files are converted. All of the processing steps can be enabled and configured in the profile configuration.

tip

To view the processing steps in the Archive PDF/A workflows, see PDF/A workflow steps.

Convert mode for child documents (Attachments)

The convert mode defines the document that are converted (Convert option) and are skipped (removed) from the result. When removing documents, a warning (Skip with Warning option) or an informational message (Skip option) is generated.

The convert mode can be determined based on the type of the child document, its filename, or the type of its parent document. For example, by default Office files are converted to PDF/A, executables are removed, and other non-convertible documents are removed with a warning.

Collect modes

The collect mode defines how a converted document and its child documents are combined. The collect mode can be specified for each document type individually.

There are two collect mode categories:

  • Merge: the pages of multiple PDF documents can be merged into a single document
  • Attach: child documents can be attached (embedded) into a PDF document

For example, emails can be converted by creating a PDF collection (Portfolio) of its body and attachments. When converting Word documents, their embedded files can be attached to the converted PDF. A PDF collection (Portfolio) is a special case of the Attach collect mode, where the parent document contains no pages, but shows a convenient table of the attached documents for easy navigation. The advantage of the Attach collect mode is that all information of the input files and the files' structure can be preserved.

The Merge collect mode creates simple files that can be processed and viewed by all PDF applications. The disadvantage is that only PDF files can be merged. Furthermore, not all information can be preserved when merging PDF files. For example, document metadata, signatures, and certain interactive form fields cannot be merged and must be removed. Also, logical structure (tagging) information might be less meaningful after merging.

The recommended collect mode configuration for the Merge use case:

Doc typeCollect mode
JobMerge
WordMerge or Attach
ExcelMerge or Attach
PowerPointMerge or Attach
EmailMerge or Attach
Archive (ZIP)Merge or Attach
PDFPreserve Structure

The recommended collect mode configuration for the Attach use cases:

Doc typeCollect mode
JobCollection or single document
WordAttach
ExcelAttach
PowerPointAttach
EmailAttach
Archive (ZIP)Collection or single document
PDFPreserve Structure
tip

If necessary, the Flatten collect mode can be used for PDF files to flatten the structure of PDF documents. Up to version 3.1 of the Conversion Service, this has been the default behavior.

Child error handling

This configuration defines how errors are handled during the conversion of child documents. In case of an error, the child document can either be skipped (removed) from the result and a warning generated (Skip with Warning option). Alternatively, the conversion of the parent document can be aborted with an error (Strict option).