Monitor Pdftools OCR Service
You can monitor the OCR service programmatically using health check endpoints exposed by both the Manager and Worker components. To check the readiness status, you can trigger these endpoints:
- Manager:
http:/healthz/ready
- Worker:
http:/healthz/ready
Check log files to monitor the correct operation of the services. The path to the log files is configurable. Review Default appsettings.json for an example of the configured path to a log file.
Manager configuration overview
The OCR Service Manager uses appsettings.json files to manage its configuration. The following example shows a configuration for the Manager component, which also shows how it interacts with Workers.
The default path of the manager configuration file:
C:\Program Files\Pdftools\Pdftools OCR Service\PdftoolsOcrService\appsettings.json
Default appsettings.json
{
"Database": {
"DatabaseType": "SqlLite",
"DeleteJobsAfterDays": 2
},
"FileStorage": {
"FileStorageType": "HostFileSystem",
"FilesDirectoryPath": "C:/ProgramData/Pdftools/OcrService/Files",
"DeleteFilesAfterDays": 2
},
"ServiceCommunication": {
"ServiceCommunicationType": "Rest",
"ConnectionString": "http://localhost:7998/"
},
"PortNumber": 7982,
"MaxRequestBodySizeBytes": 104857600,
"LogFilePath": "C:/ProgramData/Pdftools/OcrService/logs/manager-log.txt",
"LogRetentionDays": 7
}
DatabaseDatabaseType: Database backend (for exampleSqlLiteorPostgreSql). If you usePostgreSql, also addConnectionString.ConnectionString: Connection string forPostgreSql; omitted in the default settings becauseSqlLitedoesn’t require a password.DeleteJobsAfterDays: Number of days after which completed job records are removed.JobPollingIntervalMs: The maximum interval, in milliseconds, between job-status checks while the manager waits for a blocking job to finish. The default is5000.
FileStorageFileStorageType: Storage backend (for exampleHostFileSystem).FilesDirectoryPath: Directory that stores OCR-processed files.DeleteFilesAfterDays: Number of days after which stored files are deleted.
ServiceCommunicationServiceCommunicationType: Method the manager uses to reach worker nodes (currentlyRest).ConnectionString: Endpoint URL of the worker node or load balancer.WorkerHttpTimeoutMinutes: How long, in minutes, the manager waits for a worker node to respond. The default is20.MaxConcurrentRequests: The maximum number of requests the manager sends to worker nodes at the same time. The default is10.SafetyTimeoutMinutes: How long, in minutes, a blocking request waits for a job to reach a final state. The default is120.
MaxRequestBodySizeBytes: Maximum number of bytes for the request size. Default is 104857600 bytes, approximately 100 MB. If necessary, increase this value for larger files.PortNumber: Port that the manager listens on; the default is7982.LogFilePath: Path to the log file.LogRetentionDays: Number of days to retain log files.
JobPollingIntervalMs, WorkerHttpTimeoutMinutes, MaxConcurrentRequests, and SafetyTimeoutMinutes are optional. To override a default, add JobPollingIntervalMs under Database and the other keys under ServiceCommunication in the manager appsettings.json file.
Worker configuration overview
The OCR Service Worker uses appsettings.json files to manage its configuration. The following example shows a configuration for the Worker node.
The following is the default path of the worker configuration file:
C:\Program Files\Pdftools\Pdftools OCR Service\PdftoolsOcrWorker\appsettings.json
Default appsettings.json
{
"PortNumber": 7998,
"LogFilePath": "C:/ProgramData/Pdftools/OcrService/logs/worker-log.txt",
"LogRetentionDays": 7,
"FileStorage": {
"FileStorageType": "HostFileSystem",
"FilesDirectoryPath": "C:/ProgramData/Pdftools/OcrService/Files"
},
"ServiceCommunication": {
"ServiceCommunicationType": "Rest"
},
"Licensing": {
"LicenseKey": "<LICENSE_KEY>",
"LgsURL": "http://localhost:9999"
}
}
PortNumber: Port used by the worker node for incoming requests, 7998 is the default in this case.LogFilePath: Path to the log file and the retention period for logs.LogRetentionDays: Retention period for logs.FileStorageFileStorageType: Storage system type (for exampleHostFileSystem).FilesDirectoryPath: Directory path for storing OCR-processed files.
ServiceCommunicationServiceCommunicationType: Communication method that reaches the worker (currentlyRest).
LicensingLicenseKey: Your Pdftools product key. Provide the license key in every worker that you set up.LgsURL: Your connection URL to the Licensing Gateway Service (LGS). An optional property. If you don’t specify theLgsURL, the Pdftools OCR Service automatically connects to the Pdftools Licensing Service, requiring an internet connection.
Manage the services on Linux
On a native Linux installation, the manager and worker run as systemd services.
Service control
To inspect or manage the running services, use these commands:
# Status
sudo systemctl status pdftools-ocr-service pdftools-ocr-worker
# Restart both (for example, after editing appsettings.json)
sudo systemctl restart pdftools-ocr-worker pdftools-ocr-service
# Stop (preserves on-disk state)
sudo systemctl stop pdftools-ocr-service pdftools-ocr-worker
# Disable boot-time start (enabled by default after install)
sudo systemctl disable pdftools-ocr-service pdftools-ocr-worker
Log locations
On a native Linux installation, the OCR Service writes logs to the following locations:
| Source | Location |
|---|---|
| Manager (structured) | /var/log/pdftools/manager-log.txt |
| Worker (structured) | /var/log/pdftools/worker-log.txt |
| systemd manager unit | journalctl -u pdftools-ocr-service |
| systemd worker unit | journalctl -u pdftools-ocr-worker |
Disk usage
Two paths grow over time:
| Path | Contents | Cleanup |
|---|---|---|
/var/lib/pdftools/files/ | Job inputs and intermediate state | Manager deletes entries after DeleteFilesAfterDays (default 2). |
/var/log/pdftools/ | Service logs | Rotated after LogRetentionDays (default 7). |
Manual cleanup is rarely needed because the manager runs a background sweep. If logs grow unexpectedly, inspect journalctl for an error loop.
Troubleshooting on Linux
These sections cover common install and runtime issues on a native Linux installation.
Package isn’t signed or GPG check failed (RPM)
The OCR Service package isn’t GPG-signed. Pass --nogpgcheck to bypass the check:
sudo dnf install --nogpgcheck ./pdftools-ocr-service-VERSION_NUMBER-1.x86_64.rpm
Install rejects with has not been validated on
The pre-install guard detected that your distribution isn’t in the validated set. To proceed, do one of the following:
- Use a validated distribution. Refer to Supported.
- If your distribution is listed under Likely supported, retry the install using the override in Install on a likely supported distribution).
- If your distribution is listed under Not supported, deploy Pdftools OCR Service in Docker instead.
Worker logs show Engine creation failed: NativeBase type initializer threw exception
The host is on EL9, Amazon Linux 2023, SLES 15, or another unsupported distribution. Pdftools OCR Service requires GLIBCXX_3.4.30 (GCC 12 or newer), which the host’s libstdc++ doesn’t provide. Switch to Docker or upgrade to a supported distribution.
Worker segfaults on start, or apt install rejects with a libxml2 dependency error
The host is on Ubuntu 25.10 or a future Debian release that bumped the libxml2 SONAME. The OCR plugin and worker binary both link libxml2.so.2, which isn’t available on these distributions. Switch to Docker.
License is not valid in worker logs
Check the following:
- The
LicenseKeyfield in/opt/pdftools/ocr-worker/appsettings.jsonisn’t still the"<LICENSE_KEY>"placeholder. - The host can reach the Pdftools licensing service, or your on-premises Licensing Gateway Service (
LgsURL) if configured. - The owner of
appsettings.jsonispdftools:pdftools. If you edited the file as root, restore ownership with:sudo chown pdftools:pdftools /opt/pdftools/ocr-worker/appsettings.json
/healthz/ready returns Degraded indefinitely
The worker process is up, but the OCR engine probe hasn’t succeeded. Check /var/log/pdftools/worker-log.txt for the actual error. Common causes are an invalid license key, a missing license entitlement, or a network issue reaching the licensing service.
Port 7982 or 7998 already in use
Identify the conflicting process:
sudo ss -ltnp | grep -E '7982|7998'
Stop the conflicting process, or change the PortNumber in the relevant appsettings.json.
Gather a support bundle
To collect diagnostics for a support ticket, run the commands for your package format. Each command appends to pdftools-support-bundle.txt.
RPM (Rocky, AlmaLinux, RHEL, Oracle Linux)
To gather the support bundle on an RPM-based system:
sudo journalctl -u pdftools-ocr-service -u pdftools-ocr-worker --no-pager -n 500 > pdftools-support-bundle.txt
sudo tail -n 500 /var/log/pdftools/manager-log.txt /var/log/pdftools/worker-log.txt >> pdftools-support-bundle.txt
rpm -q pdftools-ocr-service >> pdftools-support-bundle.txt
cat /etc/os-release >> pdftools-support-bundle.txt
free -h >> pdftools-support-bundle.txt
df -h /var/lib/pdftools /var/log/pdftools /opt/pdftools >> pdftools-support-bundle.txt
DEB (Ubuntu, Debian)
To gather the support bundle on a DEB-based system:
sudo journalctl -u pdftools-ocr-service -u pdftools-ocr-worker --no-pager -n 500 > pdftools-support-bundle.txt
sudo tail -n 500 /var/log/pdftools/manager-log.txt /var/log/pdftools/worker-log.txt >> pdftools-support-bundle.txt
dpkg -l pdftools-ocr-service >> pdftools-support-bundle.txt
cat /etc/os-release >> pdftools-support-bundle.txt
free -h >> pdftools-support-bundle.txt
df -h /var/lib/pdftools /var/log/pdftools /opt/pdftools >> pdftools-support-bundle.txt
Attach the resulting pdftools-support-bundle.txt to your support request, together with the exact error message you observed.