Skip to main content

Pdftools OCR Service on Linux

Install Pdftools OCR Service on Linux from RPM or DEB packages, set the license key, and then run the manager and worker as systemd services.

Get a license key

To get an evaluation or full license key, follow these steps:

  1. Fill in the Pdftools contact form and mention that you want to evaluate or use Pdftools OCR Service.
  2. After you receive confirmation, sign up or log in to the Pdftools Portal.
  3. Click See product next to Pdftools OCR Service, and then copy your license key.

Prerequisites

  • A supported Linux distribution. Refer to Linux in the supported operating systems documentation.
  • The Pdftools OCR Service manager and worker packages for your distribution. Refer to Download the packages.
  • A valid Pdftools OCR Service license key. Refer to Get a license key.

Download the packages

Pdftools OCR Service needs two packages: a manager and a larger worker (about 4.6 GB). Download both for your distribution:

  1. Log in to the Pdftools Portal.

  2. On the Products page, next to Pdftools OCR Service, click Get started or See product.

  3. In the Product builds section, find the packages for your distribution, and then click Download for both the manager and the worker:

    • RPM:
      • pdftools-ocr-service-manager-VERSION_NUMBER-1.x86_64.rpm
      • pdftools-ocr-service-worker-VERSION_NUMBER-1.x86_64.rpm
    • DEB:
      • pdftools-ocr-service-manager_VERSION_NUMBER_amd64.deb
      • pdftools-ocr-service-worker_VERSION_NUMBER_amd64.deb

    The exact filename, for example pdftools-ocr-service-worker_1.1.4_amd64.deb, depends on the version you are installing.

Install Pdftools OCR Service

Install both packages with the matching package manager. Choose one of the following deployment models:

Single-host installation

Install both packages on the same host. They share the /var/lib/pdftools/files directory, so the manager hands jobs to the worker automatically. The manager’s default configuration already points to a worker on the same host at http://localhost:7998/, so you don’t need to configure the connection.

Install using RPM

Install the worker package:

sudo dnf install --nogpgcheck ./pdftools-ocr-service-worker-VERSION_NUMBER-1.x86_64.rpm

Install the manager package:

sudo dnf install --nogpgcheck ./pdftools-ocr-service-manager-VERSION_NUMBER-1.x86_64.rpm

Both commands include --nogpgcheck because the Pdftools packages aren’t signed, and RHEL-family distributions verify package signatures by default.

Replace VERSION_NUMBER with the exact version of the packages you downloaded, for example 1.1.4.

Continue with Set the license key.

Install using DEB

Install the worker package:

sudo apt install ./pdftools-ocr-service-worker_VERSION_NUMBER_amd64.deb

Install the manager package:

sudo apt install ./pdftools-ocr-service-manager_VERSION_NUMBER_amd64.deb

The Pdftools packages aren’t signed, but apt doesn’t verify signatures for local-file installs, so no extra flag is needed.

Replace VERSION_NUMBER with the exact version of the packages you downloaded, for example 1.1.4.

Continue with Set the license key.

Distributed installation

Install the worker package on the worker host (or hosts) and the manager package on the manager host. Use the same dnf or apt commands as for a single-host installation, but pass only the package for that host’s role.

Shared file storage required

The manager and the worker exchange files through their file-storage backend, not over the REST connection between them. A distributed deployment needs a shared backend that both hosts can reach at the same path:

  • A shared network filesystem (NFS or SMB) mounted at FileStorage.FilesDirectoryPath on every host, or
  • A shared object store, by setting FileStorageType to MinIO and pointing every host at the same bucket.

Without shared storage, the worker can’t read the files the manager wrote, and jobs fail.

After you install the packages, point the manager at the worker. On the manager host, edit ServiceCommunication.ConnectionString in /opt/pdftools/ocr-service/appsettings.json to the worker’s address, for example http://worker-host:7998/. To run workers behind a load balancer, point the connection string at the load balancer instead.

Set the license key on each worker host, as described in Set the license key.

Install on a likely supported distribution

If Likely supported lists your distribution, the install refuses with has not been validated on. Prepend env PDFTOOLS_SKIP_OS_CHECK=1 to the install command. For example, for a single-host RPM installation, run the following command:

sudo env PDFTOOLS_SKIP_OS_CHECK=1 dnf install --nogpgcheck \
./pdftools-ocr-service-worker-VERSION_NUMBER-1.x86_64.rpm \
./pdftools-ocr-service-manager-VERSION_NUMBER-1.x86_64.rpm

For DEB, run the following command:

sudo env PDFTOOLS_SKIP_OS_CHECK=1 apt install \
./pdftools-ocr-service-worker_VERSION_NUMBER_amd64.deb \
./pdftools-ocr-service-manager_VERSION_NUMBER_amd64.deb

Set the license key

The worker requires a license key. To set it, follow these steps:

  1. Open the worker’s appsettings.json file:
    sudo $EDITOR /opt/pdftools/ocr-worker/appsettings.json
  2. Replace the "<LICENSE_KEY>" placeholder with the license key you copied from Pdftools Portal:
    {
    "PortNumber": 7998,
    "Licensing": {
    "LicenseKey": "<LICENSE_KEY>"
    }
    }
  3. Restore the file’s owner so the pdftools service user can read it:
    sudo chown pdftools:pdftools /opt/pdftools/ocr-worker/appsettings.json
tip

In its default configuration, Pdftools OCR Service requires a network connection to validate the license key. For information about partially offline or fully offline solutions, review Pdftools OCR Service licensing, in Pdftools licensing documentation.

For optional configuration, such as ports and file-storage retention, refer to Monitor Pdftools OCR Service.

Start the service and verify

To start Pdftools OCR Service and confirm it’s healthy, follow these steps:

  1. Start the worker, and then the manager. Each package’s post-install script enables its unit for boot-time start automatically.

    On a single-host installation, start both units on the same host:

    sudo systemctl start pdftools-ocr-worker pdftools-ocr-service

    On a distributed installation, start pdftools-ocr-worker on each worker host, and pdftools-ocr-service on the manager host.

  2. Wait for the worker to load the OCR engine, and then check the manager’s readiness endpoint. The first start typically takes 30 to 60 seconds:

    curl -fsS http://localhost:7982/healthz/ready

    Run this against the manager host. A response body of Healthy confirms the service is ready. A response body of Degraded (also HTTP 200) means the manager is up but the worker engine probe hasn’t completed yet. Wait longer or inspect the worker logs.

Send a test OCR job

The following example uses the DocumentConversion_Accuracy profile, which Conversion Service applies by default when it delegates OCR. For the full list of profiles, languages, and request options, refer to OCR Service parameters.

To send a test OCR job, follow these steps:

  1. Send a TIFF file from your file system to OCR Service:
    curl -X POST "http://localhost:7982/?version=4&params=PredefinedProfile%3DDocumentConversion_Accuracy&languages=English&block=true&priority=Normal" \
    -H "Content-Type: image/tiff" \
    --data-binary @your-test-file.tif \
    -o result.xml
  2. Check the response. A successful response writes the recognized text blocks to result.xml. A 4xx or 5xx response indicates a configuration problem. Refer to Troubleshooting on Linux.

Uninstall

How you remove the packages depends on your deployment:

  • Single-host installation: remove both the manager and worker packages.
  • Distributed installation: remove the manager package on the manager host, and the worker package on each worker host.

RPM

Remove the packages with dnf:

sudo dnf remove pdftools-ocr-service-manager pdftools-ocr-service-worker

The uninstall removes the manager’s files under /opt/pdftools/ocr-service/, the worker’s files under /opt/pdftools/ocr-worker/ and /opt/ABBYY/, and the systemd unit files. By Linux packaging convention, the pdftools system user isn’t removed.

The following items persist after uninstall:

  • /var/lib/pdftools/files/ (job data)
  • /var/log/pdftools/ (logs)
  • /opt/pdftools/ocr-worker/appsettings.json.rpmsave (your edited configuration, preserved by RPM convention)

DEB

Remove the packages with apt:

sudo apt remove pdftools-ocr-service-manager pdftools-ocr-service-worker

To also remove the configuration files, run sudo apt purge pdftools-ocr-service-manager pdftools-ocr-service-worker instead. Job data and logs persist in both cases.

Next steps

You installed Pdftools OCR Service on Linux, set the license key, and verified that it’s running. Continue with the following: