Skip to main content
Version: Version 1.0.0

Scale Pdftools OCR Service

The Pdftools OCR service uses a master-worker architecture. The central master node, called the Pdftools OCR Service Manager, distributes tasks to multiple worker nodes, called Pdftools OCR Service Workers, which perform the actual processing.

In the Pdftools OCR Service installer, you can set up Pdftools OCR Service Manager that communicates with a Pdftools OCR Service Worker. The following diagram illustrates this configuration:

This diagram illustrates the communication between the Conversion Service and the Pdftools OCR Service worker and manager nodes.

Learn how to scale the Pdftools OCR Service horizontally by configuring the manager to work with multiple worker nodes in this guide.

Scaling worker

The manager node communicates with worker nodes through a RESTful API. In the scaled worker setup, we will create an architecture similar to the following:

This diagram illustrates the communication between the Conversion Service and the Pdftools OCR Service Manager with a load balancer and two worker nodes.
  1. Locate the manager configuration file. In a default installation, the file is located at:
    C:\Program Files\Pdftools\Pdftools OCR Service\PdftoolsOcrService\appsettings.json
  2. Point ServiceCommunication to your load balancer:
    {
    "ServiceCommunication": {
    "ServiceCommunicationType": "Rest",
    "ConnectionString": "http://localhost:8080/"
    }
    }
  3. Install workers on different host machines. No need to install the manager again.
    Screenshot of the Pdftools OCR Service Windows MSI installer.
  4. Configure a load balancer to distribute requests to your worker nodes.
  5. Locate the worker configuration file. In a default installation, the file is located at:
    C:\Program Files\Pdftools\Pdftools OCR Service\PdftoolsOcrWorker\appsettings.json
  6. Add a license key to each worker
    {
    "Licensing": {
    "LicenseKey": "<LICENSE_KEY>"
    }
    }
    In a default installation, the worker configuration file is located at:
    C:\Program Files\Pdftools\Pdftools OCR Service\PdftoolsOcrWorker\appsettings.json
  7. Share a common file storage between all manager and worker nodes:
    {
    "FileStorage": {
    "FileStorageType": "HostFileSystem",
    "FilesDirectoryPath": "F:/SharedFolder/ProgramData/Pdftools/OcrService/Files"
    }
    }
    • FileStorage
      • FileStorageType: Storage system type (for example, HostFileSystem).
      • FilesDirectoryPath: Directory path for storing OCR-processed files.
  8. Make sure every manager and all worker nodes:
    • Uses the same FileStorage settings.
    • Has read and write permission for the shared directory.

Scaling manager

You can scale the manager nodes similarly to the worker nodes.

info

Switch to PostgreSQL instead of SQLite to support horizontal scaling of manager nodes. The PostgreSQL database lets you share the state of multiple managers. For more information, review Default appsettings.json.

This diagram illustrates the communication between the Conversion Service, a load balancer, followed by two host systems with OCR manager nodes, a shared database, and another load balancer spreading the workload to two host systems with worker nodes.

The following steps add to the previous section, where worker nodes were already scaled.

  1. Locate the manager configuration file. In a default installation, the file is located at:
    C:\Program Files\Pdftools\Pdftools OCR Service\PdftoolsOcrService\appsettings.json
  2. Configure PostgreSQL:
    {
    "Database": {
    "DatabaseType": "PostgreSql",
    "ConnectionString" : "User ID=myUser;Password=mySecurePassword;Server=my.database.com;Port=5432;Database=ocr-service-db;",
    "DeleteJobsAfterDays": 2
    }
    }
  3. Install manager nodes on separate hosts.
  4. Configure a load balancer for the manager node.