This section outlines how to create a custom sensitive information type in Microsoft Purview.
Scenario
Sensitive Information Types are used to recognize patterns within documents. They are very helpful when creating custom Data Loss Prevention Policies within Microsoft 365, as you can specify a very specific and granular pattern you wish to recognize in your documents. A Sensitive Information Type can define the structure of the numbering schema you have in your organization, such as Purchase Order Numbers or Contract Numbers or specific Highly Confidential Client Names and so forth.
Therefore, how do you create a custom Sensitive Information Type? Resolution
Step 1: Access Microsoft Purview Home Page
Step 2: Select “Data Classification”
Step 3: Select “Sensitive Info Types”
Step 4: Select “+ Create Sensitive Info Type”
Step 5: Enter a name and a description for your new sensitive info type, then select “Next” located at the bottom of the screen.
Step 6: Select “+ Create Pattern” or “Create on now” to define the settings of your sensitive info type.
Step 7: Select a confidence level for your sensitive info type.
Confidence Level: Reflects the confidence of the match, which typically increase when supporting elements are detected along with the primary element. The more supporting elements you specify, the higher the confidence level you should select to ensure that matched items contain the sensitive info you're looking for.
High Confidence Level: will contain more supporting elements in proximity of the primary element.
Medium Confidence: will contain an average amount of supporting elements in proximity of the primary element.
Low Confidence: will contain little to no supporting elements in proximity of the primary element.
Step 8: Select the primary element for your sensitive info type.
Primary Elements: This is the main info you want to detect in content. You can define the primary element by using:
Regular Expression: Regular expressions (RegEx) are strings of text that create patterns to help identify and match the info you're looking for. RegEx strings can be formatted many ways. For example \d{6} identifies a six-digit number in the content.
Keyword List: Keyword lists identify the words and phrases you want this info type to detect. For example, the keyword list to identify Netherlands VAT numbers is 'VAT number, vat no, vat number, VAT#'.
Keyword Dictionary: Unlike keyword lists (which are limited in size) keyword dictionaries provide easier management of keywords and at a much larger scale.
Functions: Functions are used to find text that's formatted in a specific way. For example, 'func_credit_card' looks for 14 to 16 digit credit card numbers that can be formatted or unformatted and which must pass the Luhn test.
Step 9:
Optional: Enter a Character Proximity value or select “Anywhere in the document”
Character Proximity: When the primary element is matched, any supporting elements will match only when found within this proximity to the primary element. The closer the primary and supporting elements are to each other, the more likely the detected content is going to be what you're looking for.
Step 10:
Optional: Select “+ Add Supporting Elements or group of elements”.
Supporting Elements: Adding supporting elements increases the likelihood that the detected info is a true match. For example, let's say you want to detect nine-digit employee ID numbers. Not all nine-digit numbers are employee ID numbers, so you can add supporting elements to look for related text near the ID numbers, such as keywords like "employee", "badge", and "ID". When the primary element is matched, any supporting elements will match only when found within the character proximity to the primary element.
Step 11:
Optional: Select “+ Add Additional Checks”
Additional Checks: To further refine the evaluation and detection of matching items, you can include additional checks that include or exclude specific text and/or patterns. For example, you can exclude specific 16-digit numbers that might incorrectly be identified as a credit card number.
Step 12: Select “Create” in blue located at the bottom on the pane. Once your pattern is created you will be able to view, copy, edit, or delete it from the pattern screen. When you are done creating your patterns, select “Next” in blue at the bottom of the screen.
Step 13: Select the level of Confidence to show in the compliance policies. This will appear as the recommended confidence level for this info type when it's included in supported compliance policies. Admins will be able to change it as needed. Each level reflects how many supporting elements were detected along with the primary element. The more supporting elements an item contains, the higher the confidence that a matched item contains the sensitive info you're looking for.
Step 14: Finally Review your sensitive info type settings. If you are satisfied with the review, select “Create” in blue at the bottom of the screen.
Note: During the review, select the circled “i” next to the Patterns description to review the pattern settings.
About Cadence Solutions
Jordan Uytterhagen founded Cadence Solutions starting on the client side of the table. His mandate has been to help organizations struggling with digital transformation implement projects without losing their trust and confidence. Our solutions include automation of human resources, finance, accounts payable, contract management, document capture, drawing and records management, as well as managed services. Cadence Solutions has proven, time and again, that our client's projects will be successful because we are authentic with unmatched experience.