AI Smart Redact

AI Smart Redact detects and permanently removes sensitive information from PDFs. The service runs entirely within your infrastructure, so no data leaves your environment. AI Smart Redact is built for regulated industries with strict data-sovereignty and compliance requirements: government, financial services, insurance, healthcare, and legal sectors, that require full data sovereignty, provable compliance, and complete auditability.

How smart redaction works

AI Smart Redact processes documents through a four-stage pipeline.

AI Smart Redact workflow: upload document, detect sensitive data, review findings, apply redactions, download redacted output

Input. An integrating system submits a PDF. AI Smart Redact encrypts it immediately.
Detect. The detection engine identifies personally identifiable information (PII) using a hybrid of an AI model and a deterministic rules engine.
Review. A reviewer inspects, dismisses, or adds detections, and then approves the set before any redaction is applied.
Redact. AI Smart Redact creates a new PDF by copying only the visible, approved elements. Hidden content, metadata, and invisible layers don’t carry over.

Detection engine

AI Smart Redact combines two complementary detection approaches:

AI model. A non-generative Named Entity Recognition (NER) model. It identifies context-dependent entities (people, organizations, addresses) and supports English, German, French, Italian, Spanish, Portuguese, and Dutch. The model works out of the box; no customer data is needed for training. It can’t hallucinate or produce output beyond text in the document.
Rules engine. A deterministic pattern matcher for structured identifiers: credit card numbers, IBANs, account numbers, case IDs, and other domain-specific patterns. Each match is explainable, and checksum or format validation rejects false-positive matches.

You can extend both: add new PII entity types through configuration, and add new patterns without retraining the model. For the full pipeline and per-method details, refer to Detection.

Key features

AI Smart Redact provides:

Self-hosted: Deploy in your own infrastructure. License validation is offline. Runtime usage reporting connects to the Pdftools licensing server, or to an on-premise License Gateway Service for air-gapped deployments.
True redaction: The output PDF contains only visible, approved elements. Hidden content, metadata, and invisible layers don’t carry over.
Human-in-the-Loop (HITL) review: A reviewer approves every detection before redaction.
Full audit trail: OpenTelemetry integration provides per-job traceability. Every detection and redaction action is logged for compliance verification.

Data handling

File encryption

AI Smart Redact encrypts each uploaded file at rest using AES-256-GCM with a unique per-file Data Encryption Key (DEK). The Manager doesn’t persist DEK tokens; it returns each token to the integrating system, which holds it. The Orchestrator caches tokens temporarily for the human review workflow only; refer to DEK token storage in the human review workflow. Without the token, the encrypted file is cryptographically unreadable.

DEK token storage in the human review workflow

During human review, the Orchestrator caches each DEK token until the reviewer finishes. Two backends are available:

Backend	When to use
Redis (recommended)	Configure with `Redis__ConnectionString` on the Orchestrator. Deploy without persistence (no AOF, no RDB) so cached tokens are lost on restart, which is what guarantees crypto-erasure.
In-memory (fallback)	Used automatically when `Redis__ConnectionString` is empty. Single-instance only; tokens don’t survive a restart or scale across replicas.

Crypto-erasure

Deleting a DEK token makes the corresponding file permanently unrecoverable, even if encrypted blobs remain in backup storage. This supports provable deletion in line with General Data Protection Regulation (GDPR) Art. 5(1)(e) and NIST SP 800-88.

The following scenarios trigger crypto-erasure:

Scenario	Result
Client deletes the DEK token	File is immediately and permanently unrecoverable.
DEK token time to live (TTL) expires	Server rejects further operations; file is unrecoverable.
Client calls `DELETE /v1/files/{fileId}`	Encrypted blob deleted; token discarded.

Compliance coverage

The DEK token design addresses GDPR Art. 5(1)(b,c,e), Art. 30, Art. 32, Art. 35, and NIST SP 800-88.

System requirements

The following table lists the minimum RAM and CPU allocation per service container, with notes on what drives each figure:

Service	RAM	CPU	Notes
Worker (CPU)	4 GB	2 cores	The AI model loads ~2.9 GB into memory at startup. Detection pins one core at 100%.
Worker (GPU)	4 GB+	2 cores	GPU inference offloads compute, but the model still loads into RAM. VRAM requirements depend on the GPU.
Manager	1 GB	2 cores	Baseline ~217 MB. Peaks during file encryption at about two times the file size per concurrent upload.
Orchestrator	1 GB	2 cores	Similar profile to Manager (proxies uploads, manages sessions).
PostgreSQL (per DB)	512 MB	1 core	Observed 44-73 MB under load. 512 MB provides headroom for query cache and connection state.
RabbitMQ	512 MB	1 core	Lightweight for this workload. Increase if queue depths grow large (>10k messages).
Redis	256 MB	0.5 core	Ephemeral session/token cache only (no persistence).

Total minimum for the full stack (CPU mode): ~8.5 GB RAM, 9.5 CPU cores (includes two PostgreSQL instances: one for Manager, one for Orchestrator).

GPU acceleration

A CUDA-compatible GPU is optional but recommended for higher detection throughput at scale. For more details, refer to Scale and Worker configuration.

Containerization

AI Smart Redact ships as Docker images and supports Docker Compose and Kubernetes deployments. For setup steps, review Getting started.

Licensing

AI Smart Redact is licensed per deployment. For setup, review Licensing. To get a license or discuss pricing, contact sales.

How smart redaction works​

Detection engine​

Key features​

Data handling​

File encryption​

DEK token storage in the human review workflow​

Crypto-erasure​

System requirements​

Containerization​

Licensing​