Skip to main content

Worker

The Worker performs detection and redaction. Only the Manager calls it; there is no externally exposed API. Set environment variables on the Worker container to configure it; the default port is 4885. The configuration applies per Worker instance, so run multiple Workers to scale throughput. For the naming convention and shared notes, refer to Configuration reference.

The Worker shares two configuration sections with the Manager: FileStorage and Encryption. Both services must hold the same values.

Default appsettings.json

{
"WebServer": {
"PortNumber": 4885,
"MaxFileSizeBytes": null,
"MaxConcurrentConnections": 1000,
"RequestHeadersTimeout": null,
"KeepAliveTimeout": null,
"MinRequestBodyDataRateBytesPerSecond": null,
"MinRequestBodyDataRateGracePeriod": null
},
"LogFilePath": "./logs/smart-redact-worker-log.txt",
"LogRetentionDays": 7,
"FileStorage": {
"FileStorageType": "HostFileSystem",
"FilesDirectoryPath": "/app/storage_folder"
},
"Encryption": {
"EncryptionKey": "<ENCRYPTION_KEY>",
"DekTokenTtlMinutes": 1440
},
"ServiceCommunication": {
"ServiceCommunicationType": "Rest"
},
"Inference": {
"ExecutionProvider": "Auto",
"GpuDeviceId": 0,
"CpuUtilizationPercentage": 80,
"GraphOptimizationLevel": "All",
"ExecutionMode": "Parallel",
"MaxChunkSize": 256,
"MaxLength": 512,
"MaxWidth": 12,
"BatchSize": 1
},
"Licensing": {
"LicenseKey": "<LICENSE_KEY>",
"LgsURL": ""
}
}

Each section is described below.

Licensing

The Worker validates the license at startup and exits if the key is missing or invalid.

Licensing__LicenseKey=<LICENSE_KEY>
SettingDefaultDescription
LicenseKeyrequiredThe AI Smart Redact license key issued by Pdftools. Must match the Manager.
LgsURLOptional URL of an on-premise License Gateway Service for air-gapped deployments.

File storage

The Worker reads input files from and writes output files to the same store as the Manager. The fields and accepted values are the same. Refer to File storage on the Manager configuration page.

Encryption

The Worker uses the same encryption key as the Manager to unwrap DEK tokens received with each job. The fields and accepted values are the same. Refer to Encryption on the Manager configuration page.

Service communication

The transport configured here must match the Manager’s.

Transport

For RabbitMQ, the Worker connects to the same broker as the Manager.

ServiceCommunication__ServiceCommunicationType=RabbitMQ
ServiceCommunication__Host=<RABBITMQ_HOST>
ServiceCommunication__Username=<USERNAME>
ServiceCommunication__Password=<PASSWORD>

For REST transport, only the type is set on the Worker; the Manager initiates all calls to the Worker’s HTTP endpoints on the configured WebServer port.

ServiceCommunication__ServiceCommunicationType=Rest
SettingDefaultDescription
ServiceCommunicationTyperequiredRest or RabbitMQ. Must match the Manager.
Hostrequired for RabbitMQBroker host name.
Usernamerequired for RabbitMQBroker username.
Passwordrequired for RabbitMQBroker password.

Concurrency

Caps on how many jobs each Worker instance processes in parallel. Detection holds an inference slot for the duration of the job, so running multiple detections in parallel adds memory pressure without improving throughput; the default is 1. Redaction is lighter and runs up to four jobs in parallel by default.

ServiceCommunication__DetectionConcurrencyLimit=1
ServiceCommunication__RedactionConcurrencyLimit=4
SettingDefaultDescription
DetectionConcurrencyLimit1Maximum detection jobs processed concurrently by this Worker instance.
RedactionConcurrencyLimit4Maximum redaction jobs processed concurrently by this Worker instance.

Inference

The Worker runs a semantic detection model for context-aware entity recognition. The Inference section tunes the inference runtime and the chunking behavior.

Inference__ExecutionProvider=Auto
Inference__GpuDeviceId=0
Inference__CpuUtilizationPercentage=80
Inference__GraphOptimizationLevel=All
Inference__ExecutionMode=Parallel
Inference__BatchSize=1
Inference__MaxChunkSize=256

Hardware

SettingDefaultDescription
ExecutionProviderAutoAuto uses GPU when running the -cuda Worker image, otherwise CPU. Cpu forces CPU inference.
GpuDeviceId0Index of the GPU to use when ExecutionProvider resolves to a GPU.
CpuUtilizationPercentage80Percentage of available CPU cores the runtime uses for inference. Range 1–100.

Runtime

SettingDefaultDescription
GraphOptimizationLevelAllGraph optimization level: DisableAll, Basic, Extended, or All.
ExecutionModeParallelSequential or Parallel.
BatchSize1Number of text chunks sent to the model per inference call. Range 1–100; values outside the range are clamped. Higher values increase throughput at the cost of memory.

Chunking

Long text is split into chunks before inference.

SettingDefaultDescription
MaxChunkSize256Maximum tokens per chunk. Higher values give the model more context; lower values reduce per-chunk latency. Clamped to MaxLength if set higher.
MaxLength512Hard upper bound on input length in tokens supported by the model. Don’t increase beyond what the configured model accepts.
MaxWidth12Maximum span width in words for a single detected entity.

Web server

WebServer__PortNumber=4885
SettingDefaultDescription
PortNumber4885TCP port the Worker listens on. The Manager calls this port only when ServiceCommunicationType is Rest.
MaxFileSizeBytesnull (no limit)Maximum allowed body size on Worker endpoints. The Worker doesn’t accept user uploads, so the limit is normally left unset.
MaxConcurrentConnections1000Maximum concurrent connections accepted by Kestrel.

The remaining Kestrel limits (RequestHeadersTimeout, KeepAliveTimeout, MinRequestBodyDataRateBytesPerSecond, MinRequestBodyDataRateGracePeriod) accept the same values as on the Manager. Refer to Web server on the Manager configuration page.

Logging

Application logs are written to the console and, optionally, to a file. The fields are top-level (no section prefix).

LogFilePath=./logs/smart-redact-worker-log.txt
LogRetentionDays=7
SettingDefaultDescription
LogFilePathPath of the rolling-daily log file inside the container. Leave empty to disable file logging.
LogRetentionDays7Number of days log files are retained on disk.

The minimum log level isn’t a separate setting. It’s derived from the standard ASPNETCORE_ENVIRONMENT environment variable: when set to Development, the service emits Debug-level logs in a developer-friendly console format; any other value (the default) emits Information-level logs in JSON. Use Development only for local diagnostics.