Some thoughts about Data Classification and Labelling in the Cloud
Data and information classification and labelling must be important if the ISO standard has two dedicated controls for these topics. The reason? Not all data is created equal; some can freely float around and leave the organization, whereas others must be secured at all costs.
Data classification is the act of categorizing or classifying data based on its importance and sensitivity to the organization. The core question is: How much trouble will the organization have if this data is leaked, stolen, or misused? Many organizations follow a four-level classification scheme, like the following one:
Labelling is the process of assigning a classification to a specific file, email, or dataset manually or using automated tools. Azure Information Protection (AIP) is relatively well-known in the Office 365 world. It helps classify and label office files, especially ones on SharePoint, and supports manual or automatic labelling.
In the cloud world, Amazon’s AWS Macie is a cloud service that manages data security and privacy for data stored in Amazon S3. Server files or database data, for example, are not in scope. Besides providing information on whether S3 buckets are publicly exposed, or which encryption type is in use, AWS Macie also supports one-time or periodic scans of the data in S3 buckets to help organizations understand whether and which (sensitive) data they store there.
Google has a similar service. Its name is GPC DLP. The naming might remind you of classic DLP solutions such as Symantec or M365 DLP. However, GCP DLP is for identifying and de-identifying sensitive data so that it can be widely shared, e.g., by masking the critical attributes or text fragments. It is imperative to understand that this is more of an API that developers can use in their application context than a masking service a cloud platform team might centrally manage to improve an organization’s overall data protection level. The good news is that the inspection works for some widely used GCP services, including Cloud SQL, Cloud Storage, and Big Query.? Figure 1 provides an example in the GCP console.
But what happens after the data classification and labelling? This depends on what is behind the activity. Is security or compliance the driver? When compliance-driven, organizations focus precisely on what is needed for compliance and avoid investing more effort. I need to classify and label? So, if everything has a label, the topic is covered for the next assessment, and we are done. If the focus is (also) on security, the work starts after the labelling. Which measures are risk-adequate depending on the classification?
These measures relate to tools and instructions. On the instruction level, secret data might be forbidden in public clouds and confidential on developer laptops or servers in the engineering zones. Labels can also influence how data loss prevention solutions decide and act. DLP tools might block, for example, all outgoing emails or file transfers with a “secret” label. So, organizations should not forget to clarify internally the impact they want the data labels to have.
While protecting data in the cloud is a top priority, security architects should nevertheless carefully examine whether to implement or improve the labelling of M365 Office documents—if not already in place—before tackling more specialized data types like technical files or application logs on VMs in the cloud. Office documents are probably the ones most at risk.
?
?
?
?
?
?
?