Skip to main content

Architecture

AI Smart Redact is composed of three subsystems. The Manager stores files and runs detection or redaction jobs. The Orchestrator sits in front of the Manager to add authentication, user management, and the review workflow that powers the Human-in-the-Loop (HITL) web application. The Worker runs the detection pipeline and the AI model, and is internal to the Docker network.

Component diagram

In the default configuration, the Manager and Worker communicate through a REST API over HTTP. Each stateful subsystem has its own PostgreSQL database, and the Manager and Worker share a single file storage volume. The Manager and the Worker both read and write files directly to the shared storage. The Manager doesn’t proxy file I/O for the Worker.

The Orchestrator additionally uses a Redis instance for DEK-token caching, which is shipped with the default Docker Compose stack but isn’t shown in the diagram below for clarity.

Subsystems

SubsystemPurposeDefault port
ManagerFile storage, detection and redaction jobs, persistence9982
OrchestratorAuthentication, user management, HITL workflow9983
WorkerDetection pipeline, AI model inference4885 (internal)

Manager

The Manager owns file storage and the job lifecycle. Clients call the Manager directly for API integrations, or through the Orchestrator for browser-based workflows. The Manager persists job state in PostgreSQL and reads or writes PDFs and intermediate artifacts to the shared file storage volume.

Orchestrator

The Orchestrator wraps the Manager with JWT authentication, user accounts, and the review workflow that powers the HITL web application. It has its own PostgreSQL database for users, sessions, and HITL state. The HITL web application talks to the Orchestrator only.

The Orchestrator also uses a Redis instance as an optional cache for DEK tokens. Redis is included in the default Docker Compose stack but isn’t strictly required: if no Redis connection string is configured, the Orchestrator falls back to caching DEK tokens in memory.

Worker

The Worker accepts detection or redaction commands from the Manager (HTTP/REST in the default transport), reads the PDF directly from shared file storage, runs the detection pipeline (pattern matching, keyword matching, and AI model inference), and writes results back to file storage. The Worker port (4885) isn’t exposed outside the Docker network.

File access goes straight to the configured backend, which is either a shared local volume mounted on both Manager and Worker or an S3-compatible object store. The Worker doesn’t fetch or upload files through the Manager’s REST API.

Protocols

ProtocolWhere it’s used
HTTPClients to Manager or Orchestrator
HTTP/RESTOrchestrator to Manager, Manager to Worker (default transport)
SQLManager and Orchestrator to their PostgreSQL databases (over the Npgsql client)
file I/OManager and Worker each access shared storage directly (local volume or S3-compatible object store)
inferenceDetection Pipeline to AI model

Deployment variants

The component diagram shows the default REST transport. For higher throughput, AI Smart Redact also supports a RabbitMQ-based variant with multiple Workers behind shared queues, multiple Manager instances behind a load balancer, and external file storage on S3. Refer to Scale AI Smart Redact for these variants and tuning guidance.

Next steps