Keyword recognizers
A keyword recognizer is a denylist: a configured list of words or phrases that should be detected as sensitive entities when matched in the input text. Matching uses the Aho-Corasick algorithm, which scans for all keywords in a single pass.
Use keyword recognizers for terms specific to your domain. To suppress already-detected entities instead, use Keyword exclusions (an allowlist).
Configuration
Out of the box, no keyword recognizer is configured. Add your own to detect domain-specific terms:
{
"detectionConfiguration": {
"keywordRecognizers": [
{
"name": "SensitiveTerms",
"label": "SENSITIVE_KEYWORD",
"keywords": ["confidential", "top secret", "internal only"],
"score": 1.0
}
]
}
}
For the full schema, refer to KeywordRecognizer schema.
Match modes
- Exact match (default). The keyword must appear as a whole token.
partialMatch: false. - Partial match. The keyword can appear as a substring of any word.
partialMatch: true.
Matching is case-sensitive in both modes.