Developed for use with Microsoft Purview DLP, which uses the PCRE-compatible "Boost.Regex" engine for pattern matching.
Matches unformatted US Social Security number patterns (i.e. nine digits with no separators):
Exclude all-zero area, group, or serial segment sequences:
00XXXXXX, XXX00XXXX, or XXXXX0000
Exclude group numbers 666 and 9##:
666XXXXXX or 900XXXXXX
Exclude ascending and descending number sequences:
123456789, 876543210, etc.
Excludes known retired SSNs:
078051120 and 219099999
Allows dash, em-dash, en-dash, space, slash, tab, and "null" separators
Boundary checks to prevent matching on telephone, credit card, and other non-SSN types.
Excludes sequences ending with common file extensions:
.pdf, .doc(x), .xls(x), .ppt(x), .zip, .jp(e)g, .png, and .log
Derived from: Comprehensive US SSN (Social Security Number)
Also uses patterns from the "U.S. Social Security Number (SSN) (Nucleuz Inc)" Sensitive Information Type contained in the Microsoft Purview DLP tool.