登录查看更多内容

Some thoughts about Data Classification and Labelling in the Cloud

Klaus Haller

发布日期: 2025年3月11日

Data and information classification and labelling must be important if the ISO standard has two dedicated controls for these topics. The reason? Not all data is created equal; some can freely float around and leave the organization, whereas others must be secured at all costs.

Data classification is the act of categorizing or classifying data based on its importance and sensitivity to the organization. The core question is: How much trouble will the organization have if this data is leaked, stolen, or misused? Many organizations follow a four-level classification scheme, like the following one:

Public data is freely available for download from the internet or enterprise data that can be shared openly without harm (e.g., press releases, public reports).
Internal data is meant for internal use without being highly sensitive, e.g., company policies or general training materials.
Confidential data comprises sensitive data that, if leaked, could harm operations, reputation, or sales and financials. Sample data for this category can be undisclosed financial reports or customer data from a CRM.
Secret data. Exposure of such data is likely to cause severe damage. Trade secrets, unreleased products, and (essential) cryptographic materials typically fall into this category.

Labelling is the process of assigning a classification to a specific file, email, or dataset manually or using automated tools. Azure Information Protection (AIP) is relatively well-known in the Office 365 world. It helps classify and label office files, especially ones on SharePoint, and supports manual or automatic labelling.

In the cloud world, Amazon’s AWS Macie is a cloud service that manages data security and privacy for data stored in Amazon S3. Server files or database data, for example, are not in scope. Besides providing information on whether S3 buckets are publicly exposed, or which encryption type is in use, AWS Macie also supports one-time or periodic scans of the data in S3 buckets to help organizations understand whether and which (sensitive) data they store there.

Google has a similar service. Its name is GPC DLP. The naming might remind you of classic DLP solutions such as Symantec or M365 DLP. However, GCP DLP is for identifying and de-identifying sensitive data so that it can be widely shared, e.g., by masking the critical attributes or text fragments. It is imperative to understand that this is more of an API that developers can use in their application context than a masking service a cloud platform team might centrally manage to improve an organization’s overall data protection level. The good news is that the inspection works for some widely used GCP services, including Cloud SQL, Cloud Storage, and Big Query.? Figure 1 provides an example in the GCP console.

Figure 1: Scanning for sensitve data in GCP - configuration and result.

But what happens after the data classification and labelling? This depends on what is behind the activity. Is security or compliance the driver? When compliance-driven, organizations focus precisely on what is needed for compliance and avoid investing more effort. I need to classify and label? So, if everything has a label, the topic is covered for the next assessment, and we are done. If the focus is (also) on security, the work starts after the labelling. Which measures are risk-adequate depending on the classification?

These measures relate to tools and instructions. On the instruction level, secret data might be forbidden in public clouds and confidential on developer laptops or servers in the engineering zones. Labels can also influence how data loss prevention solutions decide and act. DLP tools might block, for example, all outgoing emails or file transfers with a “secret” label. So, organizations should not forget to clarify internally the impact they want the data labels to have.

While protecting data in the cloud is a top priority, security architects should nevertheless carefully examine whether to implement or improve the labelling of M365 Office documents—if not already in place—before tackling more specialized data types like technical files or application logs on VMs in the cloud. Office documents are probably the ones most at risk.

The Swiss Cloud Sec Architect

787 位关注者

要查看或添加评论，请登录

Klaus Haller的更多文章

The most essentical cloud-native Security Services in AWS, Azure, and GCP

2025年3月10日

The most essentical cloud-native Security Services in AWS, Azure, and GCP

The pure number of cloud (security) services might overwhelm security specialists, in particular when they work in…
A Short Intro to Logging in the Cloud

2025年2月20日

A Short Intro to Logging in the Cloud

Logging is the systematic recording of events in an IT environment. It is the foundation for proactively identifying…
Security Architects & Cloud Backup Strategies

2025年2月17日

Security Architects & Cloud Backup Strategies

Cloud security architects should understand well-established backup concepts and patterns—such as RTO, RPO, and the…

2 条评论
Is Workload Security Overrated? ??

2025年2月13日

Is Workload Security Overrated? ??

Lately, I've been rethinking our priorities in security architecture. Are we putting too much emphasis on workload…

2 条评论
DeepSeek - Shaking Up the AI Marketplace Without Redefining AI

2025年1月28日

DeepSeek - Shaking Up the AI Marketplace Without Redefining AI

All eyes are on DeepSeek, the emerging AI star from China. But how does DeepSeek revolutionize the world of artificial…
RedHat Connect 2025 Dübendorf: Containers, Automation, and AI

2025年1月15日

RedHat Connect 2025 Dübendorf: Containers, Automation, and AI

Today, I had the pleasure of attending the RedHat Connect 2025 event in Dübendorf, a stone's throw away from Zurich…

1 条评论
My Top-3 2024 Security Articles

2024年12月30日

My Top-3 2024 Security Articles

As we look back on 2024, I want to highlight my most impactful posts that really connected with my audience. If you…
Securing AI: What the OWASP LLM Top 10 Gets Right – and What It Misses

2024年12月24日

Securing AI: What the OWASP LLM Top 10 Gets Right – and What It Misses

As the year winds down and we reflect on how much technology has shaped 2024, it’s hard not to notice how AI –…
Certificate Management in Azure and GCP: A Brief Look

2024年12月22日

Certificate Management in Azure and GCP: A Brief Look

Certificates play a crucial role in securing communication and controlling access to (web) services. All leading clouds…
X.509 and the World of Certificates

2024年12月2日

X.509 and the World of Certificates

X.509 certificates act as digital IDs, verifying the identity of websites, servers, and (web) services for…

1 条评论

See all articles

The Swiss Cloud Sec Architect

787 位关注者

Klaus Haller的更多文章

The most essentical cloud-native Security Services in AWS, Azure, and GCP

A Short Intro to Logging in the Cloud

Security Architects & Cloud Backup Strategies

Is Workload Security Overrated? ??

DeepSeek - Shaking Up the AI Marketplace Without Redefining AI

RedHat Connect 2025 Dübendorf: Containers, Automation, and AI

My Top-3 2024 Security Articles

Securing AI: What the OWASP LLM Top 10 Gets Right – and What It Misses

Certificate Management in Azure and GCP: A Brief Look

X.509 and the World of Certificates

社区洞察