Pdftools OCR Service on Linux
Install Pdftools OCR Service on Linux from RPM or DEB packages, set the license key, and then run the manager and worker as systemd services.
Get a license key
To get an evaluation or full license key, follow these steps:
- Fill in the Pdftools contact form and mention that you want to evaluate or use Pdftools OCR Service.
- After you receive confirmation, sign up or log in to the Pdftools Portal.
- Click See product next to Pdftools OCR Service, and then copy your license key.
Prerequisites
- A supported Linux distribution. Refer to Linux in the supported operating systems documentation.
- The Pdftools OCR Service manager and worker packages for your distribution. Refer to Download the packages.
- A valid Pdftools OCR Service license key. Refer to Get a license key.
Download the packages
Pdftools OCR Service needs two packages: a manager and a larger worker (about 4.6 GB). Download both for your distribution:
-
Log in to the Pdftools Portal.
-
On the Products page, next to Pdftools OCR Service, click Get started or See product.
-
In the Product builds section, find the packages for your distribution, and then click Download for both the manager and the worker:
- RPM:
pdftools-ocr-service-manager-VERSION_NUMBER-1.x86_64.rpmpdftools-ocr-service-worker-VERSION_NUMBER-1.x86_64.rpm
- DEB:
pdftools-ocr-service-manager_VERSION_NUMBER_amd64.debpdftools-ocr-service-worker_VERSION_NUMBER_amd64.deb
The exact filename, for example
pdftools-ocr-service-worker_1.1.4_amd64.deb, depends on the version you are installing. - RPM:
Install Pdftools OCR Service
Install both packages with the matching package manager. Choose one of the following deployment models:
- Single-host installation: the manager and the worker run on the same host. This is the simplest setup.
- Distributed installation: the manager and the worker run on separate hosts.
Single-host installation
Install both packages on the same host. They share the /var/lib/pdftools/files directory, so the manager hands jobs to the worker automatically. The manager’s default configuration already points to a worker on the same host at http://localhost:7998/, so you don’t need to configure the connection.
Install using RPM
Install the worker package:
sudo dnf install --nogpgcheck ./pdftools-ocr-service-worker-VERSION_NUMBER-1.x86_64.rpm
Install the manager package:
sudo dnf install --nogpgcheck ./pdftools-ocr-service-manager-VERSION_NUMBER-1.x86_64.rpm
Both commands include --nogpgcheck because the Pdftools packages aren’t signed, and RHEL-family distributions verify package signatures by default.
Replace VERSION_NUMBER with the exact version of the packages you downloaded, for example 1.1.4.
Continue with Set the license key.
Install using DEB
Install the worker package:
sudo apt install ./pdftools-ocr-service-worker_VERSION_NUMBER_amd64.deb
Install the manager package:
sudo apt install ./pdftools-ocr-service-manager_VERSION_NUMBER_amd64.deb
The Pdftools packages aren’t signed, but apt doesn’t verify signatures for local-file installs, so no extra flag is needed.
Replace VERSION_NUMBER with the exact version of the packages you downloaded, for example 1.1.4.
Continue with Set the license key.
Distributed installation
Install the worker package on the worker host (or hosts) and the manager package on the manager host. Use the same dnf or apt commands as for a single-host installation, but pass only the package for that host’s role.
The manager and the worker exchange files through their file-storage backend, not over the REST connection between them. A distributed deployment needs a shared backend that both hosts can reach at the same path:
- A shared network filesystem (NFS or SMB) mounted at
FileStorage.FilesDirectoryPathon every host, or - A shared object store, by setting
FileStorageTypetoMinIOand pointing every host at the same bucket.
Without shared storage, the worker can’t read the files the manager wrote, and jobs fail.
After you install the packages, point the manager at the worker. On the manager host, edit ServiceCommunication.ConnectionString in /opt/pdftools/ocr-service/appsettings.json to the worker’s address, for example http://worker-host:7998/. To run workers behind a load balancer, point the connection string at the load balancer instead.
Set the license key on each worker host, as described in Set the license key.
Install on a likely supported distribution
If Likely supported lists your distribution, the install refuses with has not been validated on. Prepend env PDFTOOLS_SKIP_OS_CHECK=1 to the install command. For example, for a single-host RPM installation, run the following command:
sudo env PDFTOOLS_SKIP_OS_CHECK=1 dnf install --nogpgcheck \
./pdftools-ocr-service-worker-VERSION_NUMBER-1.x86_64.rpm \
./pdftools-ocr-service-manager-VERSION_NUMBER-1.x86_64.rpm
For DEB, run the following command:
sudo env PDFTOOLS_SKIP_OS_CHECK=1 apt install \
./pdftools-ocr-service-worker_VERSION_NUMBER_amd64.deb \
./pdftools-ocr-service-manager_VERSION_NUMBER_amd64.deb
Set the license key
The worker requires a license key. To set it, follow these steps:
- Open the worker’s
appsettings.jsonfile:sudo $EDITOR /opt/pdftools/ocr-worker/appsettings.json - Replace the
"<LICENSE_KEY>"placeholder with the license key you copied from Pdftools Portal:{"PortNumber": 7998,"Licensing": {"LicenseKey": "<LICENSE_KEY>"}} - Restore the file’s owner so the
pdftoolsservice user can read it:sudo chown pdftools:pdftools /opt/pdftools/ocr-worker/appsettings.json
In its default configuration, Pdftools OCR Service requires a network connection to validate the license key. For information about partially offline or fully offline solutions, review Pdftools OCR Service licensing, in Pdftools licensing documentation.
For optional configuration, such as ports and file-storage retention, refer to Monitor Pdftools OCR Service.
Start the service and verify
To start Pdftools OCR Service and confirm it’s healthy, follow these steps:
-
Start the worker, and then the manager. Each package’s post-install script enables its unit for boot-time start automatically.
On a single-host installation, start both units on the same host:
sudo systemctl start pdftools-ocr-worker pdftools-ocr-serviceOn a distributed installation, start
pdftools-ocr-workeron each worker host, andpdftools-ocr-serviceon the manager host. -
Wait for the worker to load the OCR engine, and then check the manager’s readiness endpoint. The first start typically takes 30 to 60 seconds:
curl -fsS http://localhost:7982/healthz/readyRun this against the manager host. A response body of
Healthyconfirms the service is ready. A response body ofDegraded(also HTTP 200) means the manager is up but the worker engine probe hasn’t completed yet. Wait longer or inspect the worker logs.
Send a test OCR job
The following example uses the DocumentConversion_Accuracy profile, which Conversion Service applies by default when it delegates OCR. For the full list of profiles, languages, and request options, refer to OCR Service parameters.
To send a test OCR job, follow these steps:
- Send a TIFF file from your file system to OCR Service:
curl -X POST "http://localhost:7982/?version=4¶ms=PredefinedProfile%3DDocumentConversion_Accuracy&languages=English&block=true&priority=Normal" \-H "Content-Type: image/tiff" \--data-binary @your-test-file.tif \-o result.xml
- Check the response. A successful response writes the recognized text blocks to
result.xml. A 4xx or 5xx response indicates a configuration problem. Refer to Troubleshooting on Linux.
Uninstall
How you remove the packages depends on your deployment:
- Single-host installation: remove both the manager and worker packages.
- Distributed installation: remove the manager package on the manager host, and the worker package on each worker host.
RPM
Remove the packages with dnf:
sudo dnf remove pdftools-ocr-service-manager pdftools-ocr-service-worker
The uninstall removes the manager’s files under /opt/pdftools/ocr-service/, the worker’s files under /opt/pdftools/ocr-worker/ and /opt/ABBYY/, and the systemd unit files. By Linux packaging convention, the pdftools system user isn’t removed.
The following items persist after uninstall:
/var/lib/pdftools/files/(job data)/var/log/pdftools/(logs)/opt/pdftools/ocr-worker/appsettings.json.rpmsave(your edited configuration, preserved by RPM convention)
DEB
Remove the packages with apt:
sudo apt remove pdftools-ocr-service-manager pdftools-ocr-service-worker
To also remove the configuration files, run sudo apt purge pdftools-ocr-service-manager pdftools-ocr-service-worker instead. Job data and logs persist in both cases.
Next steps
You installed Pdftools OCR Service on Linux, set the license key, and verified that it’s running. Continue with the following:
- For configuration, service control, logs, and troubleshooting, refer to Monitor Pdftools OCR Service.
- For the full list of profiles, languages, and request options, refer to Pdftools OCR Service parameters.
- To run more workers for higher throughput, refer to Scale Pdftools OCR Service.