Entity types

An entity type is a label assigned to a detected piece of sensitive information, such as EMAIL_ADDRESS or PERSON. AI Smart Redact ships with 36 built-in entity types: 32 pattern-based labels detected by regex pattern recognizers, and 4 semantic labels detected by the semantic model. You can also define your own entity types per detection request.

Pattern-based labels

The following 32 labels have built-in pattern recognizers. Each label may match more than one format. For example, DATE covers numeric and verbal date formats across more than one language.

Label	What it matches
`ALPHANUMERIC_CODE`	Mixed letter-digit codes
`BARCODE`	Product barcodes with checksum validation
`BIC_SWIFT`	Bank Identifier Codes
`CREDIT_CARD`	Credit card numbers with Luhn validation
`CURRENCY_CODE`	ISO 4217 three-letter currency codes
`DATE`	Calendar dates in common international formats
`DATETIME`	Combined date-time strings
`DECIMAL_NUMBER`	Decimal numbers
`DOMAIN_NAME`	Internet domain names
`DURATION`	Time durations
`EMAIL_ADDRESS`	Email addresses
`FILE_PATH`	Windows and Unix file paths
`GPS_COORDINATE`	Geographic coordinates
`HASHTAG`	Social media hashtags
`HTTP_COOKIE`	HTTP cookie strings
`IBAN`	International Bank Account Numbers
`INTEGER_NUMBER`	Integers
`IP_ADDRESS`	IPv4 and IPv6 addresses
`ISIN`	International Securities Identification Numbers
`LEI`	Legal Entity Identifiers
`MAC_ADDRESS`	Hardware MAC addresses
`MENTION`	Social media @mentions
`MONEY`	Monetary amounts
`NUMERIC_ID`	Digit-only identifiers
`PERCENTAGE`	Percentage values
`PHONE_NUMBER`	Phone numbers in common international formats
`SCIENTIFIC_NUMBER`	Scientific notation numbers
`TIME`	Times in common formats
`UNIQUE_IDENTIFIER`	UUID and GUID strings
`URL`	HTTP, HTTPS, and FTP URLs
`VAT_NUMBER`	VAT numbers
`VIN`	Vehicle Identification Numbers

For details on how pattern recognizers work and how to add custom ones, refer to Pattern recognizers.

Semantic labels

Out of the box, the semantic model detects four entity types:

Label	What it matches
`PERSON`	Person names
`ORGANISATION`	Organization and company names
`PHYSICAL_ADDRESS`	Street addresses and locations
`USERNAME`	Usernames and handles

The built-in default mapping is a starting point, not a fixed set. You can extend or replace it to detect any entity type the semantic model can recognize. Refer to Semantic recognizer.

Confidence scores

Every detected entity has a confidence score between 0 and 1. The score reflects how unambiguous the match is. Entities with structurally unique formats and supporting validators score higher; broader patterns that can match many non-PII strings score lower.

Range	Tier	Description
0.90–1.00	Highest	Structurally unique, almost no false positives
0.80–0.89	High	Distinctive structure, often with checksum validation
0.70–0.79	Moderate	Recognizable format with some ambiguity
0.60–0.69	Lower	Formats that overlap with common text
0.50–0.59	Low	Broad patterns with higher ambiguity
Under 0.50	Sub-threshold	Filtered out at the built-in default threshold. Refer to Sub-threshold patterns.

The built-in default scoreThreshold is 0.5. Entities scoring under the threshold are removed from the results. Raise the threshold to favor precision; lower it to favor recall.

Sub-threshold patterns

Some patterns are deliberately configured to score under the built-in default threshold to prevent false positives from highly ambiguous matches. For example, a bare four-digit number could be military time, a year, a PIN, or an arbitrary identifier; the military-time pattern scores under 0.5 and surfaces only when context boosting raises the score over the threshold.

Custom entity types

You can extend the set of detected entity types in three ways:

Add custom regex patterns. Refer to Pattern recognizers.
Add a denylist of known sensitive terms. Refer to Keyword recognizers.
Map more semantic-model outputs to new labels. Refer to Semantic recognizer.

Pattern-based labels​

Semantic labels​

Confidence scores​

Sub-threshold patterns​

Custom entity types​

Pattern-based labels

Semantic labels

Confidence scores

Sub-threshold patterns

Custom entity types