Detailed Steps to Create and Train a Classifier

Detailed Steps to Create and Train a Classifier

Detailed Steps to Create and Train a Classifier

Collect Seed Content:

  • Gather between 50-500 items that strongly represent the positive examples (i.e., the data you want the classifier to recognize).
  • Collect 150-1500 items that represent the negative examples (i.e., data that should not be recognized by the classifier).

Organize Seed Content:

  • Place the positive and negative seed content in separate SharePoint folders.
  • Ensure each folder is dedicated solely to holding the respective seed content.
  • Note the URLs for the site, library, and folders as you will need these for setting up the classifier.

Prepare SharePoint:

  • If you create a new SharePoint site and folder for your seed data, allow at least an hour for the location to be indexed before proceeding.

Portal Navigation and Setup:

  • Sign in to the Microsoft Purview portal or the Microsoft Purview compliance portal with Compliance admin or Security admin role access.
  • Navigate to Data loss prevention > Data classification > Classifiers.
  • Select the Trainable classifiers tab.
  • Choose Create trainable classifier.
  • Configure the Classifier:
  • Add the source of your positive examples by selecting the SharePoint site, library, and folder URL for the seed content to be detected by the classifier.
  • Add the source for your negative examples in the same manner.

Training Process:

  • The classifier will analyze the provided samples and begin training itself to recognize the patterns and characteristics of the positive and negative examples.
  • This process may take from a few hours to a couple of days, depending on the complexity and volume of the data.
  • Testing and Validation

Initial Testing:

  • After the classifier completes its initial training, test its accuracy by applying it to a new set of data.
  • Check the classifier’s predictions and validate if it correctly identifies positive and negative examples.

Feedback Loop:

  • Review the classifier’s predictions and provide feedback to improve its accuracy.
  • Correct any misclassifications and retrain the classifier with additional examples if necessary.

Ongoing Evaluation:

  • Continuously monitor the classifier’s performance in real-world scenarios.
  • Make adjustments and retrain as needed to maintain high accuracy and reliability.
  • Application of Classifier

Integration with Policies:

Once the classifier is trained and validated, integrate it with various compliance and security policies such as:

  • Office Sensitivity Labels: Automatically apply sensitivity labels based on the classifier’s predictions.
  • Communications Compliance Policies: Monitor and enforce compliance in communications.
  • Retention Label Policies: Manage data retention based on content classification.

Automation and Efficiency:

  • Automate routine tasks such as data classification, labeling, and compliance checks, improving efficiency and reducing manual intervention.

Best Practices and Tips

Quality Over Quantity:

  • Ensure the quality of the seed content is high, with clear and accurate examples, rather than focusing solely on the quantity.

Regular Updates:

  • Regularly update the classifier with new examples and feedback to keep it relevant and effective as your data landscape evolves.

Collaborative Effort:

  • Engage various stakeholders, including compliance officers, data managers, and security administrators, to gather diverse and comprehensive sample sets.

Documentation and Reporting:

  • Maintain thorough documentation of the classifier’s training process, data sources, and performance metrics.
  • Generate reports to track the classifier’s impact and effectiveness over time.

Conclusion

Microsoft Purview’s trainable classifiers offer a powerful way to automate and enhance data classification and compliance efforts. By following the detailed steps outlined above, organizations can effectively create, train, and utilize classifiers to maintain data security, comply with regulatory requirements, and streamline operations.

Brandon Edwards

Enterprise Architect | Cloud, Security, Modernization

9 个月

Add some images and this is a major IT article. Nice writeup Gregory.

要查看或添加评论,请登录

Gregory H. Hall的更多文章

社区洞察

其他会员也浏览了