Detection configuration
The detectionConfiguration object controls how a detection request runs. It’s supplied per request through the Manager API and merges on top of the built-in defaults: only the fields you set override the defaults. Property names and enum values use camelCase. Unknown fields don’t fail the request; they’re returned as validation warnings.
For details on submitting a request, refer to Detection (Manager API).
Top-level fields
{
"scoreThreshold": 0.5,
"labels": ["EMAIL_ADDRESS", "PHONE_NUMBER"],
"languages": ["en"],
"regexTimeout": 1000,
"checksumValidationMode": "strict",
"patternRecognizers": [],
"keywordRecognizers": [],
"keywordExclusions": [],
"semanticRecognizer": null,
"disableBuiltInPatternRecognizersForLabels": [],
"sameSpanStrategy": "deterministicWins"
}
| Field | Type | Default | Description |
|---|---|---|---|
scoreThreshold | number | 0.5 | Minimum confidence score for an entity to be returned. Values from 0.0 to 1.0. |
labels | string[] | [] | Entity types to detect. Empty or omitted means detect all labels enabled in the resolved configuration. The built-in default configuration enables every built-in label. Refer to Entity types for the full list. |
languages | string[] | [] | Any of en, de, fr, it, es, pt, nl. Empty or omitted resolves to ["en"]. Affects context words and verbal DATE patterns. Refer to Pattern recognizers. |
regexTimeout | integer | 1000 | Milliseconds per regex execution. Allowed values: 0 (use default) or 1 to 1000. Prevents catastrophic backtracking. |
checksumValidationMode | string | "strict" | One of "strict" or "relaxed". Refer to Pattern recognizers. |
patternRecognizers | array | [] | Custom pattern recognizers appended to the built-in ones. Refer to PatternRecognizer schema. |
keywordRecognizers | array | [] | Custom denylist recognizers. Refer to KeywordRecognizer schema. |
keywordExclusions | array | [] | Allowlist exclusions that suppress matched entities. Refer to KeywordExclusion schema. |
semanticRecognizer | object | null | Custom entity mapping for the semantic recognizer. When null or omitted, the built-in default mapping is used (refer to Default entity mapping). Refer to SemanticRecognizer schema. |
disableBuiltInPatternRecognizersForLabels | string[] | [] | Labels whose built-in pattern recognizers should be disabled. Useful when replacing them with custom recognizers. |
sameSpanStrategy | string | "deterministicWins" | One of "deterministicWins", "semanticWins", or "higherScoreWins". Refer to Overlap resolution. |
PatternRecognizer schema
A pattern recognizer detects entities using one or more regex patterns. Custom recognizers are appended to the built-in ones unless you also disable the built-in recognizer for the same label.
{
"name": "CustomProjectCode",
"label": "PROJECT_CODE",
"patterns": [
{
"name": "ProjectCode",
"regex": "\\bPRJ-\\d{4}-[A-Z]{3}\\b",
"score": 0.85
}
],
"contextWords": ["project", "reference"]
}
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Recognizer identifier shown in logs. |
label | string | Yes | Entity type this recognizer produces. UPPER_SNAKE_CASE. |
patterns | RegexPattern[] | Yes | One or more regex patterns. Refer to RegexPattern schema. |
contextWords | string[] | No | Words that boost the confidence score when found near a match. Omit to skip context boosting for this recognizer. |
RegexPattern schema
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Pattern identifier shown in logs. |
regex | string | Yes | Regex pattern. Compiled and cached on first use. |
score | number | No | Base confidence score on match. Values from 0.0 to 1.0. |
allowBacktracking | Boolean | No | When false (the default), the regex runs in a non-backtracking mode. Only set to true when your pattern requires features that need backtracking. |
KeywordRecognizer schema
A keyword recognizer detects entities by matching against a list of known sensitive terms (a denylist).
{
"name": "SensitiveTerms",
"label": "SENSITIVE_KEYWORD",
"keywords": ["confidential", "top secret"],
"score": 1.0,
"partialMatch": false
}
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Recognizer identifier. |
label | string | Yes | Entity type produced. UPPER_SNAKE_CASE. |
keywords | string[] | Yes | Words or phrases to detect. |
score | number | No | Confidence score for all matches. Default 1.0. |
partialMatch | Boolean | No | When false (the default), case-sensitive whole-word match. When true, substring match. |
KeywordExclusion schema
A keyword exclusion suppresses detected entities whose text matches a known safe term (an allowlist).
{
"name": "SafeHostnames",
"excludedKeywords": ["localhost", "example.com"],
"partialMatch": false
}
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Exclusion identifier. |
excludedKeywords | string[] | Yes | Keywords that, if matching an entity’s text, cause it to be excluded. |
partialMatch | Boolean | No | When false (the default), the entity text must exactly equal a keyword. When true, the entity text must contain the keyword as a substring. |
SemanticRecognizer schema
{
"name": "GlinerLarge",
"entityMapping": {
"name": "PERSON",
"company name": "ORGANISATION",
"location address": "PHYSICAL_ADDRESS"
}
}
The example above is illustrative. The shipped default mapping has additional entries; refer to Default entity mapping for the full list.
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Fixed identifier for the semantic model. Only "GlinerLarge" is accepted; the model itself can’t be changed. |
entityMapping | object | Yes | Maps semantic-model output labels (key) to standardized entity labels (value, UPPER_SNAKE_CASE). This is the only customizable part of semanticRecognizer. |
If you don’t need to override the entity mapping, omit semanticRecognizer entirely. Supplying it solely to repeat the default mapping has no effect on detection results.
Disable built-in pattern recognizers
To replace a built-in pattern recognizer with your own, disable the built-in recognizer for that label and add a custom recognizer for the same label. The following example replaces the built-in EMAIL_ADDRESS recognizer with one that only matches addresses on example.com:
{
"disableBuiltInPatternRecognizersForLabels": ["EMAIL_ADDRESS"],
"patternRecognizers": [
{
"name": "InternalEmail",
"label": "EMAIL_ADDRESS",
"patterns": [
{
"name": "InternalDomain",
"regex": "\\b[\\w.+-]+@example\\.com\\b",
"score": 0.95
}
]
}
]
}
Complete example
The following request configuration restricts detection to four labels, raises the score threshold, adds a custom pattern recognizer for project codes, adds a denylist for sensitive terms, and excludes a known false-positive value:
{
"detectionConfiguration": {
"scoreThreshold": 0.7,
"labels": ["EMAIL_ADDRESS", "PHONE_NUMBER", "PROJECT_CODE", "PERSON"],
"languages": ["en", "de"],
"patternRecognizers": [
{
"name": "CustomProjectCode",
"label": "PROJECT_CODE",
"patterns": [
{
"name": "ProjectCode",
"regex": "\\bPRJ-\\d{4}-[A-Z]{3}\\b",
"score": 0.85
}
],
"contextWords": ["project", "reference"]
}
],
"keywordRecognizers": [
{
"name": "SensitiveTerms",
"label": "SENSITIVE_KEYWORD",
"keywords": ["confidential", "top secret"]
}
],
"keywordExclusions": [
{
"name": "SafeAddresses",
"excludedKeywords": ["test@example.com"]
}
]
}
}