Detection configuration

The detectionConfiguration object controls how a detection request runs. It’s supplied per request through the Manager API and merges on top of the built-in defaults: only the fields you set override the defaults. Property names and enum values use camelCase. Unknown fields don’t fail the request; they’re returned as validation warnings.

For details on submitting a request, refer to Detection (Manager API).

Top-level fields

{
  "scoreThreshold": 0.5,
  "labels": ["EMAIL_ADDRESS", "PHONE_NUMBER"],
  "languages": ["en"],
  "regexTimeout": 1000,
  "checksumValidationMode": "strict",
  "patternRecognizers": [],
  "keywordRecognizers": [],
  "keywordExclusions": [],
  "semanticRecognizer": null,
  "disableBuiltInPatternRecognizersForLabels": [],
  "sameSpanStrategy": "deterministicWins"
}

Field	Type	Default	Description
`scoreThreshold`	number	`0.5`	Minimum confidence score for an entity to be returned. Values from `0.0` to `1.0`.
`labels`	string[]	`[]`	Entity types to detect. Empty or omitted means detect all labels enabled in the resolved configuration. The built-in default configuration enables every built-in label. Refer to Entity types for the full list.
`languages`	string[]	`[]`	Any of `en`, `de`, `fr`, `it`, `es`, `pt`, `nl`. Empty or omitted resolves to `["en"]`. Affects context words and verbal `DATE` patterns. Refer to Pattern recognizers.
`regexTimeout`	integer	`1000`	Milliseconds per regex execution. Allowed values: `0` (use default) or `1` to `1000`. Prevents catastrophic backtracking.
`checksumValidationMode`	string	`"strict"`	One of `"strict"` or `"relaxed"`. Refer to Pattern recognizers.
`patternRecognizers`	array	`[]`	Custom pattern recognizers appended to the built-in ones. Refer to `PatternRecognizer` schema.
`keywordRecognizers`	array	`[]`	Custom denylist recognizers. Refer to `KeywordRecognizer` schema.
`keywordExclusions`	array	`[]`	Allowlist exclusions that suppress matched entities. Refer to `KeywordExclusion` schema.
`semanticRecognizer`	object	`null`	Custom entity mapping for the semantic recognizer. When `null` or omitted, the built-in default mapping is used (refer to Default entity mapping). Refer to `SemanticRecognizer` schema.
`disableBuiltInPatternRecognizersForLabels`	string[]	`[]`	Labels whose built-in pattern recognizers should be disabled. Useful when replacing them with custom recognizers.
`sameSpanStrategy`	string	`"deterministicWins"`	One of `"deterministicWins"`, `"semanticWins"`, or `"higherScoreWins"`. Refer to Overlap resolution.

`PatternRecognizer` schema

A pattern recognizer detects entities using one or more regex patterns. Custom recognizers are appended to the built-in ones unless you also disable the built-in recognizer for the same label.

{
  "name": "CustomProjectCode",
  "label": "PROJECT_CODE",
  "patterns": [
    {
      "name": "ProjectCode",
      "regex": "\\bPRJ-\\d{4}-[A-Z]{3}\\b",
      "score": 0.85
    }
  ],
  "contextWords": ["project", "reference"]
}

Field	Type	Required	Description
`name`	string	Yes	Recognizer identifier shown in logs.
`label`	string	Yes	Entity type this recognizer produces. UPPER_SNAKE_CASE.
`patterns`	RegexPattern[]	Yes	One or more regex patterns. Refer to `RegexPattern` schema.
`contextWords`	string[]	No	Words that boost the confidence score when found near a match. Omit to skip context boosting for this recognizer.

`RegexPattern` schema

Field	Type	Required	Description
`name`	string	Yes	Pattern identifier shown in logs.
`regex`	string	Yes	Regex pattern. Compiled and cached on first use.
`score`	number	No	Base confidence score on match. Values from `0.0` to `1.0`.
`allowBacktracking`	Boolean	No	When `false` (the default), the regex runs in a non-backtracking mode. Only set to `true` when your pattern requires features that need backtracking.

`KeywordRecognizer` schema

A keyword recognizer detects entities by matching against a list of known sensitive terms (a denylist).

{
  "name": "SensitiveTerms",
  "label": "SENSITIVE_KEYWORD",
  "keywords": ["confidential", "top secret"],
  "score": 1.0,
  "partialMatch": false
}

Field	Type	Required	Description
`name`	string	Yes	Recognizer identifier.
`label`	string	Yes	Entity type produced. UPPER_SNAKE_CASE.
`keywords`	string[]	Yes	Words or phrases to detect.
`score`	number	No	Confidence score for all matches. Default `1.0`.
`partialMatch`	Boolean	No	When `false` (the default), case-sensitive whole-word match. When `true`, substring match.

`KeywordExclusion` schema

A keyword exclusion suppresses detected entities whose text matches a known safe term (an allowlist).

{
  "name": "SafeHostnames",
  "excludedKeywords": ["localhost", "example.com"],
  "partialMatch": false
}

Field	Type	Required	Description
`name`	string	Yes	Exclusion identifier.
`excludedKeywords`	string[]	Yes	Keywords that, if matching an entity’s text, cause it to be excluded.
`partialMatch`	Boolean	No	When `false` (the default), the entity text must exactly equal a keyword. When `true`, the entity text must contain the keyword as a substring.

`SemanticRecognizer` schema

{
  "name": "GlinerLarge",
  "entityMapping": {
    "name": "PERSON",
    "company name": "ORGANISATION",
    "location address": "PHYSICAL_ADDRESS"
  }
}

The example above is illustrative. The shipped default mapping has additional entries; refer to Default entity mapping for the full list.

Field	Type	Required	Description
`name`	string	Yes	Fixed identifier for the semantic model. Only `"GlinerLarge"` is accepted; the model itself can’t be changed.
`entityMapping`	object	Yes	Maps semantic-model output labels (key) to standardized entity labels (value, UPPER_SNAKE_CASE). This is the only customizable part of `semanticRecognizer`.

If you don’t need to override the entity mapping, omit semanticRecognizer entirely. Supplying it solely to repeat the default mapping has no effect on detection results.

Disable built-in pattern recognizers

To replace a built-in pattern recognizer with your own, disable the built-in recognizer for that label and add a custom recognizer for the same label. The following example replaces the built-in EMAIL_ADDRESS recognizer with one that only matches addresses on example.com:

{
  "disableBuiltInPatternRecognizersForLabels": ["EMAIL_ADDRESS"],
  "patternRecognizers": [
    {
      "name": "InternalEmail",
      "label": "EMAIL_ADDRESS",
      "patterns": [
        {
          "name": "InternalDomain",
          "regex": "\\b[\\w.+-]+@example\\.com\\b",
          "score": 0.95
        }
      ]
    }
  ]
}

Complete example

The following request configuration restricts detection to four labels, raises the score threshold, adds a custom pattern recognizer for project codes, adds a denylist for sensitive terms, and excludes a known false-positive value:

{
  "detectionConfiguration": {
    "scoreThreshold": 0.7,
    "labels": ["EMAIL_ADDRESS", "PHONE_NUMBER", "PROJECT_CODE", "PERSON"],
    "languages": ["en", "de"],
    "patternRecognizers": [
      {
        "name": "CustomProjectCode",
        "label": "PROJECT_CODE",
        "patterns": [
          {
            "name": "ProjectCode",
            "regex": "\\bPRJ-\\d{4}-[A-Z]{3}\\b",
            "score": 0.85
          }
        ],
        "contextWords": ["project", "reference"]
      }
    ],
    "keywordRecognizers": [
      {
        "name": "SensitiveTerms",
        "label": "SENSITIVE_KEYWORD",
        "keywords": ["confidential", "top secret"]
      }
    ],
    "keywordExclusions": [
      {
        "name": "SafeAddresses",
        "excludedKeywords": ["test@example.com"]
      }
    ]
  }
}

Top-level fields​

PatternRecognizer schema​

RegexPattern schema​

KeywordRecognizer schema​

KeywordExclusion schema​

SemanticRecognizer schema​

Disable built-in pattern recognizers​

Complete example​