Data Retention
Darshika Srivastava
Associate Project Manager @ HuQuo | MBA,Amity Business School
What is Data Retention?
Data retention is the practice of storing, archiving or otherwise retaining data to support internal business processes (e.g. analytics, auditing) and/or comply with external laws/regulations.
Data retention can be understood in the context of the data management lifecycle , a model or roadmap for enterprise data utilization. This model has been described in different ways by various publications, but here’s our simplified version:
?
?
Through the practice of data retention, organizations manage the activities of data archiving and data destruction in accordance with their business needs and objectives, and regulatory requirements. This includes determining which data should be retained, where it should be stored, how long it should be kept, and when/how to delete the data after the data retention period.
?
What is a Data Retention Policy?
A data retention policy is a document that establishes requirements and guidelines within an organization for archiving, retaining, and destroying enterprise data. The policy should clarify questions such as:
?
An effective data retention policy ensures that:
?
To achieve its intended purpose, a corporate data retention policy should account for the people, processes, and technologies required to ensure that enterprise data is archived and destroyed as needed to meet the organization’s business objectives and legal obligations.
Next, we’ll outline our seven-step process for creating and implementing a data retention policy within your organization.
?
?
How to Create a Data Retention Policy
1. Identify Key Data Owners
Data is often siloed in departments, such that the sales team has ownership of sales data, the accounting department owns financial data, the HR department owns staffing data, the IT department owns log data, and so forth.
Creating a data retention policy begins with identifying key data owners within your organization, getting their buy-in, and assembling a project team that represents all data owners. Each department or data owner will be responsible for managing their data in compliance with the data retention policy, so it’s important to get everyone involved in the process and communicate the importance of data retention policy to all internal stakeholders.
?
2. Inventory and Categorize Your Data
The next step in creating your data retention policy is to comprehensively inventory your data. Make a list of all types of data generated by your organization. When your list is complete, sort the data into categories based on where the data is generated or its intended use.
Some common categories of enterprise data could include:
Each category of data you identify may be subject to different data retention laws and business requirements. To account for this, you’ll need to conduct research and implement individualized data retention policies for each type of data your organization collects.
?
3. Investigate Data Retention Laws for Each Data Type
Once you have identified and categorized your enterprise data, the next step is to determine whether your data is subject to any regulatory or legal requirements with respect to data retention and preservation.
For each category of data, you’ll need to ask:
Whether your organization is subject to data retention laws will depend on your industry, the types of data your organization collects, and the jurisdictions where your business collects data. For example:
Enterprises in the United States may also be required to preserve financial and tax records, personnel records, emails, and workplace safety data in compliance with US law.
?
This image indicates legally mandated periods for email retention in the United States. Contractors with the US DOD are subject to a 3-year email retention mandate, while the IRS requires all businesses to retain every record (including emails) related to finance and personnel for 3 years after the tax season.
Enterprises can document the results of this regulatory compliance needs assessment, along with specific data retention requirements, using a simple spreadsheet. Here’s how this might look for a data category we’re familiar with: event log data.
?
Compliance Needs Assessment Template
Data Category
Compliance Requirements
Affected Data Types
Storage Location
Data Retention Period
Data Disposal Policy
Event Log Data
Application, security, and user logs from systems containing ePHI.
Amazon S3 Cloud Storage
6 years
Delete
Event Log Data
Application, security, and user logs from systems containing ePHI.
Amazon S3 Cloud Storage
10 years
Delete
Event Log Data
Security and access logs from systems containing PII.
Amazon S3 Cloud Storage
1 year
领英推荐
Delete
?
"Due to the nature of the data that we work with, student educational records, we need to retain all data and log files for 10 years after their creation to comply with our most stringent client retention policies"
?
4. Conduct a Business Needs Assessment for Each Data Type
Once you have analyzed and documented regulatory requirements for data retention within your enterprise, you can move on to conducting a business needs assessment for each category/type of data you collect.
Here, the goal is to identify which data should be stored or archived because it supports a business use case.
For each data category, you’ll need to ask:
Enterprises can and should retain event log data from applications and cloud services to support long-term log analytics use cases , such as:
As you complete this assessment, you will develop an understanding of how your organization is utilizing its data. You may even uncover some new and valuable use cases for the data you’re already collecting. The results of your business needs assessment, including specific data retention requirements for each data category/type, can be documented in a spreadsheet like the one below.
?
Business Needs Assessment Template
Data Category
Business Use Cases
Affected Data Types
Storage Location
Data Retention Period
Data Disposal Policy
Event Log Data
Security Operations and Threat Hunting
Security Event Logs
Amazon S3 Cloud Storage
1 year
Delete
Event Log Data
Application Performance Monitoring
Application Event Logs
Amazon S3 Cloud Storage
90 days
Delete
Event Log Data
Cloud Service Monitoring
Cloud Service Logs
Amazon S3 Cloud Storage
90 days
Delete
?
?
5. Write a Data Retention Policy for Each Data Type
At this point, you’ve categorized all of your enterprise data and investigated business use cases and regulatory compliance requirements for each data type. Based on the information you found, you’ve determined where the data should be stored, how long it should be retained, and if/when it should be destroyed.
Now you can start finalizing data retention policies for each type of data you collect.
Your data retention policy for each data type should include the following:
?
This data retention policy example/template indicates the data owner, data category and record type, storage location, data retention period, and whether to archive or delete the data after the retention period expires.
?
Your data retention policy should also include general guidelines for things like revision histories and policy exemptions, as well as a communication plan for data retention issues. You may also want to document a plan for enforcing compliance with your data retention policy.
?
6. Establish SOPs for Retaining, Archiving, and Destroying Data
Standard operating procedures (SOPs) describe the processes and technologies that your organization will use to store, retain, archive, and destroy data in compliance with your documented data retention policy. Data retention policies may be executed by human operators or automated using software technology and services.
Public cloud vendors like AWS offer services that help organizations automate their data retention policies in the cloud. Two examples are Amazon S3 Intelligent Tiering , which automatically transfers data between cost-optimized storage tiers based on user access patterns, and Amazon S3 Object Expiration , a feature that makes it easy for data owners to schedule the deletion of data objects in S3 buckets.
New technologies like ChaosSearch can also be used to support a cloud data retention policy . ChaosSearch is a cloud data platform that uses a proprietary indexing technology to greatly reduce the amount of storage required in S3 for a full, searchable representation of data, eliminating the need for additional data movement . ChaosSearch also has its own data retention features that can be used to augment your existing S3 retention policies and automation.
The ability to index, search, and analyze log data at scale with ChaosSearch means that organizations can retain their log data for longer periods while fully realizing its value through applications like security monitoring and cloud log analysis .
?
7. Implement Your Data Retention Policy
At this point, you should have everything in place to successfully implement your data retention policy. As the final step, you’ll need to implement your data retention policy and work to ensure compliance throughout your organization.
You’ll need to communicate the new policies and expectations to department leaders and their teams, ensure that data owners understand any new responsibilities, explain the importance of compliance, implement any new processes or technologies required to support your data retention strategy, and provide additional training as needed.
Depending on the size of your organization, implementing your data retention policy can take years. We recommend focusing on your biggest compliance priorities first and finding quick wins that can help energize stakeholders and sustain momentum as you work through the implementation process.
READ: Why Log Data Retention Windows Fail to discover the importance of adopting data storage and analytics solutions that can fully support your data retention needs.
?
A Final Word on Enterprise Data Retention
Data retention is a growing challenge for organizations facing competitive pressures to maximize the value of their data and regulatory pressures to comply with a growing number of laws surrounding data retention, privacy, and security.
By implementing a comprehensive data retention policy, backed by cost-efficient data storage and analytics solutions, organizations can achieve compliance with local and international data retention laws and regulations, maximize the availability of data to support internal business processes, and reduce data storage/analytics costs.