Skip to main content

Keyword recognizers

A keyword recognizer is a denylist: a configured list of words or phrases that should be detected as sensitive entities when matched in the input text. Matching uses the Aho-Corasick algorithm, which scans for all keywords in a single pass.

Use keyword recognizers for terms specific to your domain. To suppress already-detected entities instead, use Keyword exclusions (an allowlist).

Configuration

Out of the box, no keyword recognizer is configured. Add your own to detect domain-specific terms:

{
"detectionConfiguration": {
"keywordRecognizers": [
{
"name": "SensitiveTerms",
"label": "SENSITIVE_KEYWORD",
"keywords": ["confidential", "top secret", "internal only"],
"score": 1.0
}
]
}
}

For the full schema, refer to KeywordRecognizer schema.

Match modes

  • Exact match (default). The keyword must appear as a whole token. partialMatch: false.
  • Partial match. The keyword can appear as a substring of any word. partialMatch: true.

Matching is case-sensitive in both modes.