Skip to main content

Detection configuration

The detectionConfiguration object controls how a detection request runs. It’s supplied per request through the Manager API and merges on top of the built-in defaults: only the fields you set override the defaults. Property names and enum values use camelCase. Unknown fields don’t fail the request; they’re returned as validation warnings.

For details on submitting a request, refer to Detection (Manager API).

Top-level fields

{
"scoreThreshold": 0.5,
"labels": ["EMAIL_ADDRESS", "PHONE_NUMBER"],
"languages": ["en"],
"regexTimeout": 1000,
"checksumValidationMode": "strict",
"patternRecognizers": [],
"keywordRecognizers": [],
"keywordExclusions": [],
"semanticRecognizer": null,
"disableBuiltInPatternRecognizersForLabels": [],
"sameSpanStrategy": "deterministicWins"
}
FieldTypeDefaultDescription
scoreThresholdnumber0.5Minimum confidence score for an entity to be returned. Values from 0.0 to 1.0.
labelsstring[][]Entity types to detect. Empty or omitted means detect all labels enabled in the resolved configuration. The built-in default configuration enables every built-in label. Refer to Entity types for the full list.
languagesstring[][]Any of en, de, fr, it, es, pt, nl. Empty or omitted resolves to ["en"]. Affects context words and verbal DATE patterns. Refer to Pattern recognizers.
regexTimeoutinteger1000Milliseconds per regex execution. Allowed values: 0 (use default) or 1 to 1000. Prevents catastrophic backtracking.
checksumValidationModestring"strict"One of "strict" or "relaxed". Refer to Pattern recognizers.
patternRecognizersarray[]Custom pattern recognizers appended to the built-in ones. Refer to PatternRecognizer schema.
keywordRecognizersarray[]Custom denylist recognizers. Refer to KeywordRecognizer schema.
keywordExclusionsarray[]Allowlist exclusions that suppress matched entities. Refer to KeywordExclusion schema.
semanticRecognizerobjectnullCustom entity mapping for the semantic recognizer. When null or omitted, the built-in default mapping is used (refer to Default entity mapping). Refer to SemanticRecognizer schema.
disableBuiltInPatternRecognizersForLabelsstring[][]Labels whose built-in pattern recognizers should be disabled. Useful when replacing them with custom recognizers.
sameSpanStrategystring"deterministicWins"One of "deterministicWins", "semanticWins", or "higherScoreWins". Refer to Overlap resolution.

PatternRecognizer schema

A pattern recognizer detects entities using one or more regex patterns. Custom recognizers are appended to the built-in ones unless you also disable the built-in recognizer for the same label.

{
"name": "CustomProjectCode",
"label": "PROJECT_CODE",
"patterns": [
{
"name": "ProjectCode",
"regex": "\\bPRJ-\\d{4}-[A-Z]{3}\\b",
"score": 0.85
}
],
"contextWords": ["project", "reference"]
}
FieldTypeRequiredDescription
namestringYesRecognizer identifier shown in logs.
labelstringYesEntity type this recognizer produces. UPPER_SNAKE_CASE.
patternsRegexPattern[]YesOne or more regex patterns. Refer to RegexPattern schema.
contextWordsstring[]NoWords that boost the confidence score when found near a match. Omit to skip context boosting for this recognizer.

RegexPattern schema

FieldTypeRequiredDescription
namestringYesPattern identifier shown in logs.
regexstringYesRegex pattern. Compiled and cached on first use.
scorenumberNoBase confidence score on match. Values from 0.0 to 1.0.
allowBacktrackingBooleanNoWhen false (the default), the regex runs in a non-backtracking mode. Only set to true when your pattern requires features that need backtracking.

KeywordRecognizer schema

A keyword recognizer detects entities by matching against a list of known sensitive terms (a denylist).

{
"name": "SensitiveTerms",
"label": "SENSITIVE_KEYWORD",
"keywords": ["confidential", "top secret"],
"score": 1.0,
"partialMatch": false
}
FieldTypeRequiredDescription
namestringYesRecognizer identifier.
labelstringYesEntity type produced. UPPER_SNAKE_CASE.
keywordsstring[]YesWords or phrases to detect.
scorenumberNoConfidence score for all matches. Default 1.0.
partialMatchBooleanNoWhen false (the default), case-sensitive whole-word match. When true, substring match.

KeywordExclusion schema

A keyword exclusion suppresses detected entities whose text matches a known safe term (an allowlist).

{
"name": "SafeHostnames",
"excludedKeywords": ["localhost", "example.com"],
"partialMatch": false
}
FieldTypeRequiredDescription
namestringYesExclusion identifier.
excludedKeywordsstring[]YesKeywords that, if matching an entity’s text, cause it to be excluded.
partialMatchBooleanNoWhen false (the default), the entity text must exactly equal a keyword. When true, the entity text must contain the keyword as a substring.

SemanticRecognizer schema

{
"name": "GlinerLarge",
"entityMapping": {
"name": "PERSON",
"company name": "ORGANISATION",
"location address": "PHYSICAL_ADDRESS"
}
}

The example above is illustrative. The shipped default mapping has additional entries; refer to Default entity mapping for the full list.

FieldTypeRequiredDescription
namestringYesFixed identifier for the semantic model. Only "GlinerLarge" is accepted; the model itself can’t be changed.
entityMappingobjectYesMaps semantic-model output labels (key) to standardized entity labels (value, UPPER_SNAKE_CASE). This is the only customizable part of semanticRecognizer.

If you don’t need to override the entity mapping, omit semanticRecognizer entirely. Supplying it solely to repeat the default mapping has no effect on detection results.

Disable built-in pattern recognizers

To replace a built-in pattern recognizer with your own, disable the built-in recognizer for that label and add a custom recognizer for the same label. The following example replaces the built-in EMAIL_ADDRESS recognizer with one that only matches addresses on example.com:

{
"disableBuiltInPatternRecognizersForLabels": ["EMAIL_ADDRESS"],
"patternRecognizers": [
{
"name": "InternalEmail",
"label": "EMAIL_ADDRESS",
"patterns": [
{
"name": "InternalDomain",
"regex": "\\b[\\w.+-]+@example\\.com\\b",
"score": 0.95
}
]
}
]
}

Complete example

The following request configuration restricts detection to four labels, raises the score threshold, adds a custom pattern recognizer for project codes, adds a denylist for sensitive terms, and excludes a known false-positive value:

{
"detectionConfiguration": {
"scoreThreshold": 0.7,
"labels": ["EMAIL_ADDRESS", "PHONE_NUMBER", "PROJECT_CODE", "PERSON"],
"languages": ["en", "de"],
"patternRecognizers": [
{
"name": "CustomProjectCode",
"label": "PROJECT_CODE",
"patterns": [
{
"name": "ProjectCode",
"regex": "\\bPRJ-\\d{4}-[A-Z]{3}\\b",
"score": 0.85
}
],
"contextWords": ["project", "reference"]
}
],
"keywordRecognizers": [
{
"name": "SensitiveTerms",
"label": "SENSITIVE_KEYWORD",
"keywords": ["confidential", "top secret"]
}
],
"keywordExclusions": [
{
"name": "SafeAddresses",
"excludedKeywords": ["test@example.com"]
}
]
}
}