Skip to main content
Version: Version 1.1.3

Monitor Pdftools OCR Service

You can monitor the OCR service programmatically using health check endpoints exposed by both the Manager and Worker components. To check the readiness status, you can trigger these endpoints:

  • Manager:
    http://localhost:7982/healthz/ready
  • Worker:
    http://localhost:7998/healthz/ready

Check log files to monitor the correct operation of the services. The path to the log files is configurable. Review Default appsettings.json for an example of the configured path to a log file.

Manager configuration overview

The OCR Service Manager uses appsettings.json files to manage its configuration. The following example shows a configuration for the Manager component, which also shows how it interacts with Workers.

The default path of the manager configuration file:

C:\Program Files\Pdftools\Pdftools OCR Service\PdftoolsOcrService\appsettings.json

Default appsettings.json

{
"Database": {
"DatabaseType": "SqlLite",
"DeleteJobsAfterDays": 2
},
"FileStorage": {
"FileStorageType": "HostFileSystem",
"FilesDirectoryPath": "C:/ProgramData/Pdftools/OcrService/Files",
"DeleteFilesAfterDays": 2
},
"ServiceCommunication": {
"ServiceCommunicationType": "Rest",
"ConnectionString": "http://localhost:7998/"
},
"PortNumber": 7982,
"MaxRequestBodySizeBytes": 104857600,
"LogFilePath": "C:/ProgramData/Pdftools/OcrService/logs/manager-log.txt",
"LogRetentionDays": 7
}
  • Database
    • DatabaseType: Database backend (for example SqlLite or PostgreSql). If you use PostgreSql, also add ConnectionString.
    • ConnectionString: Connection string for PostgreSql; omitted in the default settings because SqlLite doesn’t require a password.
    • DeleteJobsAfterDays: Number of days after which completed job records are removed.
    • JobPollingIntervalMs: The maximum interval, in milliseconds, between job-status checks while the manager waits for a blocking job to finish. The default is 5000.
  • FileStorage
    • FileStorageType: Storage backend (for example HostFileSystem).
    • FilesDirectoryPath: Directory that stores OCR-processed files.
    • DeleteFilesAfterDays: Number of days after which stored files are deleted.
  • ServiceCommunication
    • ServiceCommunicationType: Method the manager uses to reach worker nodes (currently Rest).
    • ConnectionString: Endpoint URL of the worker node or load balancer.
    • WorkerHttpTimeoutMinutes: How long, in minutes, the manager waits for a worker node to respond. The default is 20.
    • MaxConcurrentRequests: The maximum number of requests the manager sends to worker nodes at the same time. The default is 10.
    • SafetyTimeoutMinutes: How long, in minutes, a blocking request waits for a job to reach a final state. The default is 120.
  • MaxRequestBodySizeBytes: Maximum number of bytes for the request size. Default is 104857600 bytes, approximately 100 MB. If necessary, increase this value for larger files.
  • PortNumber: Port that the manager listens on; the default is 7982.
  • LogFilePath: Path to the log file.
  • LogRetentionDays: Number of days to retain log files.

JobPollingIntervalMs, WorkerHttpTimeoutMinutes, MaxConcurrentRequests, and SafetyTimeoutMinutes are optional. To override a default, add JobPollingIntervalMs under Database and the other keys under ServiceCommunication in the manager appsettings.json file.

Worker configuration overview

The OCR Service Worker uses appsettings.json files to manage its configuration. The following example shows a configuration for the Worker node.

The following is the default path of the worker configuration file:

C:\Program Files\Pdftools\Pdftools OCR Service\PdftoolsOcrWorker\appsettings.json

Default appsettings.json

{
"PortNumber": 7998,
"LogFilePath": "C:/ProgramData/Pdftools/OcrService/logs/worker-log.txt",
"LogRetentionDays": 7,
"FileStorage": {
"FileStorageType": "HostFileSystem",
"FilesDirectoryPath": "C:/ProgramData/Pdftools/OcrService/Files"
},
"ServiceCommunication": {
"ServiceCommunicationType": "Rest"
},
"Licensing": {
"LicenseKey": "<LICENSE_KEY>",
"LgsURL": "http://localhost:9999"
}
}
  • PortNumber: Port used by the worker node for incoming requests, 7998 is the default in this case.
  • LogFilePath: Path to the log file and the retention period for logs.
  • LogRetentionDays: Retention period for logs.
  • FileStorage
    • FileStorageType: Storage system type (for example HostFileSystem).
    • FilesDirectoryPath: Directory path for storing OCR-processed files.
  • ServiceCommunication
    • ServiceCommunicationType: Communication method that reaches the worker (currently Rest).
  • Licensing
    • LicenseKey: Your Pdftools product key. Provide the license key in every worker that you set up.
    • LgsURL: Your connection URL to the Licensing Gateway Service (LGS). An optional property. If you don’t specify the LgsURL, the Pdftools OCR Service automatically connects to the Pdftools Licensing Service, requiring an internet connection.

Manage the services on Linux

On a native Linux installation, the manager and worker run as systemd services.

Service control

To inspect or manage the running services, use these commands:

# Status
sudo systemctl status pdftools-ocr-service pdftools-ocr-worker

# Restart both (for example, after editing appsettings.json)
sudo systemctl restart pdftools-ocr-worker pdftools-ocr-service

# Stop (preserves on-disk state)
sudo systemctl stop pdftools-ocr-service pdftools-ocr-worker

# Disable boot-time start (enabled by default after install)
sudo systemctl disable pdftools-ocr-service pdftools-ocr-worker

Log locations

On a native Linux installation, the OCR Service writes logs to the following locations:

SourceLocation
Manager (structured)/var/log/pdftools/manager-log.txt
Worker (structured)/var/log/pdftools/worker-log.txt
systemd manager unitjournalctl -u pdftools-ocr-service
systemd worker unitjournalctl -u pdftools-ocr-worker

Disk usage

Two paths grow over time:

PathContentsCleanup
/var/lib/pdftools/files/Job inputs and intermediate stateManager deletes entries after DeleteFilesAfterDays (default 2).
/var/log/pdftools/Service logsRotated after LogRetentionDays (default 7).

Manual cleanup is rarely needed because the manager runs a background sweep. If logs grow unexpectedly, inspect journalctl for an error loop.

Troubleshooting on Linux

These sections cover common install and runtime issues on a native Linux installation.

Package isn’t signed or GPG check failed (RPM)

The OCR Service package isn’t GPG-signed. Pass --nogpgcheck to bypass the check:

sudo dnf install --nogpgcheck ./pdftools-ocr-service-VERSION_NUMBER-1.x86_64.rpm

Install rejects with has not been validated on

The pre-install guard detected that your distribution isn’t in the validated set. To proceed, do one of the following:

Worker logs show Engine creation failed: NativeBase type initializer threw exception

The host is on EL9, Amazon Linux 2023, SLES 15, or another unsupported distribution. Pdftools OCR Service requires GLIBCXX_3.4.30 (GCC 12 or newer), which the host’s libstdc++ doesn’t provide. Switch to Docker or upgrade to a supported distribution.

Worker segfaults on start, or apt install rejects with a libxml2 dependency error

The host is on Ubuntu 25.10 or a future Debian release that bumped the libxml2 SONAME. The OCR plugin and worker binary both link libxml2.so.2, which isn’t available on these distributions. Switch to Docker.

License is not valid in worker logs

Check the following:

  • The LicenseKey field in /opt/pdftools/ocr-worker/appsettings.json isn’t still the "<LICENSE_KEY>" placeholder.
  • The host can reach the Pdftools licensing service, or your on-premises Licensing Gateway Service (LgsURL) if configured.
  • The owner of appsettings.json is pdftools:pdftools. If you edited the file as root, restore ownership with:
    sudo chown pdftools:pdftools /opt/pdftools/ocr-worker/appsettings.json

/healthz/ready returns Degraded indefinitely

The worker process is up, but the OCR engine probe hasn’t succeeded. Check /var/log/pdftools/worker-log.txt for the actual error. Common causes are an invalid license key, a missing license entitlement, or a network issue reaching the licensing service.

Port 7982 or 7998 already in use

Identify the conflicting process:

sudo ss -ltnp | grep -E '7982|7998'

Stop the conflicting process, or change the PortNumber in the relevant appsettings.json.

Gather a support bundle

To collect diagnostics for a support ticket, run the commands for your package format. Each command appends to pdftools-support-bundle.txt.

RPM (Rocky, AlmaLinux, RHEL, Oracle Linux)

To gather the support bundle on an RPM-based system:

sudo journalctl -u pdftools-ocr-service -u pdftools-ocr-worker --no-pager -n 500 > pdftools-support-bundle.txt
sudo tail -n 500 /var/log/pdftools/manager-log.txt /var/log/pdftools/worker-log.txt >> pdftools-support-bundle.txt
rpm -q pdftools-ocr-service >> pdftools-support-bundle.txt
cat /etc/os-release >> pdftools-support-bundle.txt
free -h >> pdftools-support-bundle.txt
df -h /var/lib/pdftools /var/log/pdftools /opt/pdftools >> pdftools-support-bundle.txt

DEB (Ubuntu, Debian)

To gather the support bundle on a DEB-based system:

sudo journalctl -u pdftools-ocr-service -u pdftools-ocr-worker --no-pager -n 500 > pdftools-support-bundle.txt
sudo tail -n 500 /var/log/pdftools/manager-log.txt /var/log/pdftools/worker-log.txt >> pdftools-support-bundle.txt
dpkg -l pdftools-ocr-service >> pdftools-support-bundle.txt
cat /etc/os-release >> pdftools-support-bundle.txt
free -h >> pdftools-support-bundle.txt
df -h /var/lib/pdftools /var/log/pdftools /opt/pdftools >> pdftools-support-bundle.txt

Attach the resulting pdftools-support-bundle.txt to your support request, together with the exact error message you observed.