Skip to main content

Extraction workflow

This workflow extracts XML data containing the extracted OCR information from PDFs. If no embedded OCR information is found, the Extraction workflow outputs an empty <document> tag.

The workflow supports these features:

  • Extraction of OCR-related XML information from PDF documents
  • Simple configuration with minimal options
  • Outputs XML with document data or an empty <document> tag if no embedded OCR data is found

Supported file formats for Sign workflow

This workflow supports these file formats:

Content typeFile type
PDF formatsPDF 1.x, PDF 2.0, PDF/A-1, PDF/A-2, PDF/A-3
info

If the input PDF does not contain embedded OCR data, the workflow outputs an XML file with an empty <document> tag.

Job options for the Extraction workflow

The Extraction workflow lets you use job options to pass job-specific values for use when processing documents.

note

Job and document options you pass at runtime affect only the current job. When you change a setting of the job or document options, that change applies only to the current job. For the next job, the workflow reverts to the profile settings (saved default) unless you pass job or document options again.

Document options

Document options apply only to a specific input. It allows you to determine specific properties based on an individual document, rather than as a global setting (determined by the job or the profile). Use the default profile settings for any subsequent jobs processed with the workflow profile.

TypeOptionDescription
Document propertyDOC.PASSWORDSet the password for the document.