Skip to main content

Set up observability for AI Smart Redact

AI Smart Redact includes built-in observability through OpenTelemetry, giving you visibility into job processing, errors, and performance. Telemetry is disabled by default and has zero runtime overhead when not configured.

Overview

The service exports three types of telemetry data through the OpenTelemetry Protocol (OTLP):

SignalWhat it tells you
TracesHow long each job took, where time was spent, whether it succeeded or failed.
LogsStructured application logs with trace correlation (TraceId/SpanId).
MetricsAPI request rates, response latencies, error rates, and job counters.

The service is compatible with any OTLP-capable backend: Grafana, Seq, Jaeger, Datadog, Elastic APM, Azure Monitor, or a self-hosted OpenTelemetry Collector.

Enable telemetry

Set these environment variables on the Manager and Worker services:

VariableRequiredDescription
OTEL_EXPORTER_OTLP_ENDPOINTYesYour OTLP collector endpoint. Setting this enables telemetry.
OTEL_EXPORTER_OTLP_PROTOCOLNoTransport protocol: grpc (default) or http/protobuf. Use http/protobuf for backends like Seq that require HTTP.
OTEL_SERVICE_NAMENoOverrides the default service name in telemetry data. Defaults to SmartRedact.Manager or SmartRedact.Worker. Set this if you prefer a different name in your telemetry backend.
info

The default service names (SmartRedact.Manager, SmartRedact.Worker) are set by the application. The example queries in this guide use these defaults. If you override OTEL_SERVICE_NAME, adjust the queries accordingly.

Docker Compose example (gRPC backend):

services:
smart-redact-manager:
environment:
OTEL_EXPORTER_OTLP_ENDPOINT: http://your-collector:4317
OTEL_EXPORTER_OTLP_PROTOCOL: grpc

smart-redact-worker:
environment:
OTEL_EXPORTER_OTLP_ENDPOINT: http://your-collector:4317
OTEL_EXPORTER_OTLP_PROTOCOL: grpc

Export defaults

Telemetry data is batched and buffered before export. These are the OpenTelemetry SDK defaults. No configuration is needed unless you want to tune them.

Traces (Batch Span Processor)

VariableDefaultDescription
OTEL_BSP_SCHEDULE_DELAY5000 msHow often batches are flushed.
OTEL_BSP_MAX_EXPORT_BATCH_SIZE512Maximum spans per export.
OTEL_BSP_MAX_QUEUE_SIZE2048Maximum spans queued before dropping.
OTEL_BSP_EXPORT_TIMEOUT30000 msTimeout per export call.

Logs (Batch Log Record Processor)

VariableDefaultDescription
OTEL_BLRP_SCHEDULE_DELAY1000 msHow often batches are flushed.
OTEL_BLRP_MAX_EXPORT_BATCH_SIZE512Maximum log records per export.
OTEL_BLRP_MAX_QUEUE_SIZE2048Maximum log records queued before dropping.
OTEL_BLRP_EXPORT_TIMEOUT30000 msTimeout per export call.

Metrics (Periodic Metric Reader)

VariableDefaultDescription
OTEL_METRIC_EXPORT_INTERVAL60000 msHow often metrics are exported.
OTEL_METRIC_EXPORT_TIMEOUT30000 msTimeout per export call.

In practice: spans are sent every 5 seconds (or when 512 accumulate), logs every 1 second, and metrics every 60 seconds.

Job processing traces

Every detection and redaction job produces a trace span on the Worker service. Each span captures:

  • Job identity: job ID, file ID, job type (detection or redaction).
  • Status: finished or error, with failure reason on errors.
  • Timing: start time, duration, end time.
  • Job metrics: page count, file size, entity counts (depending on job type).

The Manager enriches its consumer spans with job identity tags, so you can trace the full flow from Manager to Worker and back. Application logs include TraceId and SpanId fields, letting you jump from a log entry directly to its trace.

Span attributes reference

These attributes are set on Worker job processing spans:

AttributeTypeDescriptionExample
job.idstringUnique job identifiera1b2c3d4-e5f6-...
job.typestringdetection or redactiondetection
job.file.idstringPrimary input file identifiere5f6g7h8-...
job.statusstringFinal status: Finished or ErrorFinished
failure.reasonstringException type on failure (absent on success)DekTokenValidationException
input.file.pagesintInput PDF page count (detection only)12
input.file.size_byteslongInput PDF size in bytes (detection only)524288
input.entities.countintEntities submitted for redaction (redaction only)35
output.entities.countintDetected entity count (detection only)42
output.file.size_byteslongRedacted PDF size in bytes (redaction only)498000

Custom metrics reference

These Prometheus counters are exported through OTLP and tagged with job.type (detection/redaction) and job.status (Finished/Error):

Metric nameExtra tagsDescription
jobs.completed(none)Jobs completed (detection + redaction).
detection.entities.detectedentity.typeTotal entities detected, broken down by label.
detection.pages.processed(none)Total pages processed across detection jobs.
licensing.pages.consumed(none)Total pages reported for license consumption.

Useful queries

These examples use Grafana’s TraceQL and LogQL query languages.

Find all job spans for a specific job:

{span.job.id = "your-job-id-here"}

Find all failed jobs:

{span.job.status = "Error"}

Find failed detection jobs:

{span.job.status = "Error" && span.job.type = "detection"}

List recent detection results (log-based):

{service_name="SmartRedact.Worker"} |= "DetectionResultEvent"

Find log events for a specific file:

{service_name="SmartRedact.Worker"} |= "ResultEvent" | json | FileId = "your-file-id-here"