#1 Data Governance - Data Classification - Introduction
Data classification is a crucial component of Data Governance, the process involves organizing and categorizing data within an organization (or enterprise) to ensure proper management and security. Several classification systems are commonly used, including?priority/business impact, data sensitivity, and processing stage. .
Priority classification categorizes data based on their importance to decision-making. This is usually done by dividing the data into?high-priority, medium-priority, low-priority, and critical categories. Critical data is considered the most important and usually refers to information such as individual healthcare information or regulated financial information. High-priority data is important to decision-making and has severe consequences, such as employee information or R&D plans, if leaked. Medium-priority data is important to decision-making and includes sales aggregates and strategy documents. Low-priority data is typically day-to-day operational data.
Data sensitivity classification categorizes data based on their privacy requirements and the level of protection needed. There are no standard classifications for this system, but common ones include classified, least to most sensitive, privileged, open or public, and specific sensitivity classifications for regulations. Classified data is used in military, intelligence, or governmental organizations and has three sensitivity levels. Least to most sensitive categorizes data as confidential, secret, or top secret. Privileged data is not public and must be protected with legal consequences if not, such as attorney or psychiatrist information. Open or public data can be shared freely, such as a website or social media information.
Finally, the?processing stage classification categorizes data based on their processing stage and helps define permissions and uses for the data. Raw data has been received from a data source but has yet to be analyzed or validated, and limited access is allowed. Cleaned or parsed data has been processed and includes metadata, with more users able to access it. Trusted or completed data is completely documented with metadata, including other classes, and can be used by many users to generate data products.
It is recommended that all three classification systems be used to maximize documentation and governance of data. However, some organizations may only use some of the three. Additionally, the classifications are often interrelated; for example, in a heavily regulated company dealing with credit cards, credit card information may be considered critical in terms of priority, have a specific high-sensitivity classification, and need to be trusted or completed before being used for any report. On the other hand, low-priority data, such as daily work hour totals, may not be sensitive information and may have lower priority regarding cleaning and trust for use in reports.
领英推荐
By classifying data, organizations can also establish policies and apply masking and encryption techniques for data in transit and at rest, thus increasing the security of their data.
Some of the benefits of data classification are:
In practice, the data classes and categories will be defined by the Data Stewards of the corresponding department or line of business in the organization's metadata hub to have classes for all three systems. The priority categories allow for the prioritization of data processing, the sensitivity categories allow for the definition of security and privacy controls, and the processing stage categories allow for monitoring the amount of governed data ready for reports and monetization.