Some thoughts about Data Classification and Labelling in the Cloud

Some thoughts about Data Classification and Labelling in the Cloud

Data and information classification and labelling must be important if the ISO standard has two dedicated controls for these topics. The reason? Not all data is created equal; some can freely float around and leave the organization, whereas others must be secured at all costs.

Data classification is the act of categorizing or classifying data based on its importance and sensitivity to the organization. The core question is: How much trouble will the organization have if this data is leaked, stolen, or misused? Many organizations follow a four-level classification scheme, like the following one:

  • Public data is freely available for download from the internet or enterprise data that can be shared openly without harm (e.g., press releases, public reports).
  • Internal data is meant for internal use without being highly sensitive, e.g., company policies or general training materials.
  • Confidential data comprises sensitive data that, if leaked, could harm operations, reputation, or sales and financials. Sample data for this category can be undisclosed financial reports or customer data from a CRM.
  • Secret data. Exposure of such data is likely to cause severe damage. Trade secrets, unreleased products, and (essential) cryptographic materials typically fall into this category.

Labelling is the process of assigning a classification to a specific file, email, or dataset manually or using automated tools. Azure Information Protection (AIP) is relatively well-known in the Office 365 world. It helps classify and label office files, especially ones on SharePoint, and supports manual or automatic labelling.

In the cloud world, Amazon’s AWS Macie is a cloud service that manages data security and privacy for data stored in Amazon S3. Server files or database data, for example, are not in scope. Besides providing information on whether S3 buckets are publicly exposed, or which encryption type is in use, AWS Macie also supports one-time or periodic scans of the data in S3 buckets to help organizations understand whether and which (sensitive) data they store there.

Google has a similar service. Its name is GPC DLP. The naming might remind you of classic DLP solutions such as Symantec or M365 DLP. However, GCP DLP is for identifying and de-identifying sensitive data so that it can be widely shared, e.g., by masking the critical attributes or text fragments. It is imperative to understand that this is more of an API that developers can use in their application context than a masking service a cloud platform team might centrally manage to improve an organization’s overall data protection level. The good news is that the inspection works for some widely used GCP services, including Cloud SQL, Cloud Storage, and Big Query.? Figure 1 provides an example in the GCP console.


Figure 1: Scanning for sensitve data in GCP - configuration and result.

But what happens after the data classification and labelling? This depends on what is behind the activity. Is security or compliance the driver? When compliance-driven, organizations focus precisely on what is needed for compliance and avoid investing more effort. I need to classify and label? So, if everything has a label, the topic is covered for the next assessment, and we are done. If the focus is (also) on security, the work starts after the labelling. Which measures are risk-adequate depending on the classification?

These measures relate to tools and instructions. On the instruction level, secret data might be forbidden in public clouds and confidential on developer laptops or servers in the engineering zones. Labels can also influence how data loss prevention solutions decide and act. DLP tools might block, for example, all outgoing emails or file transfers with a “secret” label. So, organizations should not forget to clarify internally the impact they want the data labels to have.

While protecting data in the cloud is a top priority, security architects should nevertheless carefully examine whether to implement or improve the labelling of M365 Office documents—if not already in place—before tackling more specialized data types like technical files or application logs on VMs in the cloud. Office documents are probably the ones most at risk.

?

?

?

?

?

?

?

要查看或添加评论,请登录

Klaus Haller的更多文章

社区洞察