Skip to main content

Archive as PDF/A-2 workflow

Workflow identifier

This workflow is identified as Archive PDF/A-2 in the Conversion Service.

The Archive as PDF/A-2 workflow is engineered to prepare documents for archiving.

In particular, you can:

  • Maintain the original appearance and structure of the document.
  • Minimize the loss of information.
  • Ensure traceability and reproducibility of the changes made to the document.

The Archive as PDF/A-2 workflow supports the following features:

  • Conversion to PDF/A-2 format (PDF/A-2b, PDF/A-2u, and PDF/A-2a)
  • Optimization of PDF/A structure (optional)
  • Office conversion (optional)
  • Optical character recognition (optional)
  • Digital signatures (optional)
Conformance levels

The Conversion Service supports Basic (PDF/A-2b), Unicode (PDF/A-2u), and accessibility (PDF/A-2a) conformance levels. All conformance levels are incremental; the subsequent conformance level includes the requirements of the previous level and other additional requirements. For example, all level U (Unicode) PDF/A-2 documents are valid level B (Basic) documents.

The Archive as PDF/A-2 workflow automatically tries to convert the input document to the highest conformance level for PDF/A-2 (usually PDF/A-2a). If the input document is PDF/A-2b or a PDF without structure information, the workflow tries to convert it to PDF/A-2u. If the workflow cannot convert to the PDF/A-2u conformance level, it converts to PDF/A-2b or ends the conversion with a failure.

Supported file formats for Archive as PDF/A-2

The Archive as PDF/A-2 workflow supports the following file formats:

Content typeFile type
Document formatsPDF 1.x, PDF 2.0, PDF/A-1, PDF/A-2, PDF/A-3
Image formatsJPEG, JPEG200, TIFF, BMP, GIF, JBIG2, PNG, HEIC, HEIF, WebP, JFIF
EmailEML, MSG (without encryption)
WordDOC, DOT, DOCX, DOCM, DOTX, DOTM, RTF, XML (WordprocessingML 2003)
ExcelXLS, XLT, XLSX, XLSM, XLTX, XLTM, XML (SpreadsheetML 2003)
PowerPointPPT, PPS, PPTX, PPTM, PPSX, PPSM
OpenOfficeODT, ODS, ODP
OtherCSV, HTML, HTM (prepared for archiving), TXT, XML, ZIP (without password protection)
Note on OpenOffice formats

PDF conversion of OpenDocument Format depends on the rendering in Microsoft Word, Excel, or PowerPoint. In particular, visual differences may occur with tables and tabs. Visual differences caused by the rendering of shapes are usually not acceptable.

Note on HTML format

HTML documents must be self-contained (layout information and images are either embedded within documents or available online) and suited for portrait page layout. JavaScript content is disabled during processing.

Note on XML format

Layout information and images must be available online.

The conversion of most file formats is enabled by default in the Convert mode.

Configure the workflow

The Archive as PDF/A-2 workflow’s profile provides detailed configuration options for converting files to the PDF/A-2 standard. The profile configuration allows for the customization of all processing steps.

tip

To view the processing steps in the Archive as PDF/A workflows, see PDF/A workflow steps.

Convert mode for child documents (attachments)

Define which documents the Archive as PDF/A-2 workflow converts (Convert option) and skips (removes) from the result using the Convert mode for child documents (attachments) configuration. When removing documents, the Conversion Service generates a warning (Skip with Warning option) or an informational message (Skip option).

Using the Convert mode for child documents (attachments), you can define rules based on the child document type, its filename, or the type of its parent document. Based on these parameters, you can configure the Archive as PDF/A-2 workflow to convert, keep, or skip(remove) various child documents. For example, by default, the Archive as PDF/A-2 workflow converts Microsoft Office files to PDF/A, removes executables, and removes other non-convertible documents with a warning.

Collect mode

The collect mode defines how a converted document and its child documents are combined. The collect mode can be specified for each document type individually.

There are two collect mode categories:

  • Merge: Merge the pages of multiple PDF files into a single document.
  • Attach: Attach (embedded) child documents into a PDF file.

For example, you can configure the Archive as PDF/A-2 workflow to convert emails into a PDF collection (Portfolio) of their bodies and attachments. When converting Word documents, their embedded files can be attached to the converted PDF. A PDF collection (Portfolio) is a special case of the Attach collect mode, where the parent document contains no pages. Still, the resulting portfolio PDF displays a convenient table of the attached documents for easy navigation. The advantage of the Attach collect mode is that it can preserve all information of the input files and the files’ structure.

The Merge collect mode creates simple files that can be processed and viewed by all PDF applications. The disadvantage is that this mode can only merge PDF files. Furthermore, not all information can be preserved when merging PDF files. For example, document metadata, signatures, and certain interactive form fields cannot be merged and must be removed. Also, logical structure (tagging) information might be less meaningful after merging.

The recommended collect mode configuration for the Merge use case:

Doc typeCollect mode
JobMerge
WordMerge or Attach
ExcelMerge or Attach
PowerPointMerge or Attach
EmailMerge or Attach
Archive (ZIP)Merge or Attach
PDFPreserve Structure

The recommended collect mode configuration for the Attach use cases:

Doc typeCollect mode
JobCollection or single document
WordAttach
ExcelAttach
PowerPointAttach
EmailAttach
Archive (ZIP)Collection or single document
PDFPreserve Structure
tip

If necessary, the Flatten collect mode can be used for PDF files to flatten the structure of PDF documents. Up to version 3.1 of the Conversion Service, this has been the default behavior.

Child error handling

This configuration defines how errors are handled during the conversion of child documents. In case of an error, the child document can either be skipped (removed) from the result and a warning generated (Skip with Warning option). Alternatively, the conversion of the parent document can be aborted with an error (Strict option).

Linearization

Linearization internally restructures document content for faster loading and rendering in web browsers. However, linearization has trade-offs such as increased file size and a limited effect on rendering performance. Use PDF linearization when you need to load the first page of a PDF fast on the web.

The Conversion Service doesn’t support linearization of attached documents (child documents).

info

Although linearization provides fast, page-by-page loading over a network connection, it is not recommended for all use cases as it can considerably increase file size and has a limited effect.

Events

The Archive as PDF/A-2 workflow can generate the following events during processing. For detailed descriptions of each event code, refer to Events reference.

CodeDescription
GenericUnclassified event.
CorruptionRepairedCorrupt document repaired automatically.
ContentRecoveredParts of a corrupt document’s content recovered.
ExternalResourceUnavailableExternal resource couldn’t be reached.
ContentClippedContent cut off because it doesn’t fit the page size.
ContentOverflowContent overflowed into the page margin.
HtmlMultimediaRemovedRemoved HTML multimedia elements.
VisualDifferencesConversion resulted in visual differences.
SignatureRemovedRemoved cryptographic signatures.
NotLinearizedPDF linearization was omitted.
ColorantsResolved ambiguous or conflicting spot color descriptions.
LayersRemovedRemoved optional content groups (layers).
TransparencyRemovedTransparent objects converted to opaque. Rare for PDF/A-2 because the format allows transparency.
ChildRemovedRemoved a child element from a container.
MetadataRemovedRemoved non-conforming metadata properties.
FontSubstitutedSubstituted a missing font.
AnnotationRemovedRemoved annotations.
MultimediaRemovedRemoved multimedia content.
ActionRemovedRemoved prohibited action types.
StructureRemovedRemoved invalid or corrupt logical structure (tagging).
ContentRasterizedPage content converted to an image.
PageRendererA page was removed because it couldn’t be converted.
OcrIncompleteUnable to perform optimal OCR recognition.
UnicodesIncompleteUnable to make text fully extractable.
TaggingIncompleteUnable to generate complete tagging information.

To configure how events are handled, refer to Configure event behavior.

Job and document options for the PDF/A-2 workflow

The PDF/A-2 workflow lets you use job and document options to pass job and document-specific values to be used when processing documents using the workflow.

note

Job and document options you pass at runtime affect only the current job. When you change a setting of the job or document options, that change applies only to the current job. For the next job, the workflow reverts to the profile settings (saved default) unless you pass job or document options again.

Job options

Job options apply to all documents processed in the same job. Any subsequent jobs processed with the workflow profile use the profile’s default settings.

TypeOptionDescription
Document compression and optimizationOPTIMIZETurn on or off document compression and optimization. All settings must be previously set up in the profile.
If true, documents included in the job are compressed and optimized according to the optimization profile set in the profile settings. If false, no compression and optimization is performed.
Documents can be optimized according to five profiles:
  • Web: Compresses the file without affecting viewing quality on digital devices
  • Print: Compresses the file without affecting print quality
  • Max: Removes redundant data and reduces image resolution to achieve a minimal viable file size
  • MRC: Profile designed to process mixed raster content
  • Archive: Prepares a document for archiving in PDF/A format
OCROCRTurn on and off optical character recognition for the job. All settings must be previously set up in the profile. If true, documents included in the job are processed to recognize any images as text (as appropriate). If false, no OCR is performed.
MetadataMETA.AUTHORThe author of the document
MetadataMETA.TITLEThe title of the document
MetadataMETA.SUBJECTThe subject of the document
MetadataMETA.KEYWORDSKeywords that apply to the document
Document settingsPDF.LINEARIZEEnable or disable linearization of the output PDF.
note

You can also set extended metadata properties apart from the standard metadata properties.

Document options

Document options apply only to a specific input. It allows you to determine specific properties based on an individual document, rather than as a global setting (either determined by the job or the profile). Any subsequent jobs processed with the workflow profile use the profile’s default settings.

TypeOptionDescription
Document propertyDOC.PASSWORDSet the password for the document.