This newsletter is read by more than 165,000 followers! Thanks a lot for all your support. The idea for a newsletter started two months ago. So far, the support and response have been great.
As the name says, the idea is to focus on the five topics that make up the acronym. Please reach out with your recommendations for the inclusion of suitable posts in this newsletter.
This week, we shall continue on what we covered in TIDES-006 - Application rationalization and modernization and we shall cover an important aspect to aid this - Data archival and retrieval
Hope you enjoy the article and always stay updated!
?We're entering a new world in which data may be more important than software. - Tim O'Reilly
Why are "Data Archival and Retention Strategies" Important?
Think about these statistics for a minute - well some of the key figures of what happens in an Internet minute (from Visualcapitalist.com and domo.com) and the following article
- Amazon?customers spend $283,000
- 12 million?people send an Apple iMessage
- 6 million?people shop online
- Microsoft?Teams connects 100,000 users
- YouTube?users stream 694,000 videos
- Facebook?Live receives 44 million views
- Instagram?users share 65,000 photos
- Tiktok?users watch 167 million videos
- Instacart?users spend $67,000
- Slack?users send 148,000 messages
And following is the company Revenue Per Minute for some of the top technology oriented firms
- Amazon ==> $955,517
- Apple ==> $848,090
- Alphabet( Google) ==> $433,014
- Microsoft ==> $327,823
- Facebook ==> $213,628
- Tesla ==> $81,766
- Netflix ==> $50,566
Additionally, the amount of data and information in the digital universe effectively doubles every two years.
- In 2016, Snapchat users 527k photos per minute, compared to 2 million in 2021
- In 2017, Twitter saw 452k Tweets per minute, compared to 575k in 2021
- In 2018, $862,823 was spent online shopping, while 2 million people were shopping per minute in 2021
- In 2019, 4.5 million videos on YouTube were being viewed every minute, while in 2021 users were streaming 694k hours
- In 2020, Netflix users streamed 404k hours per minute, growing to 452k hours in 2021
While this is true for technology-oriented firms, the same is true for large legacy firms sitting on a goldmine of data collected over decades and centuries. It is important to use, manage and monetize this goldmine of data. The goldmine of data is only going to expand as many countries have not yet reached a stable or mass-scale data usage by the majority of the population especially the top-10 populous countries in the world excluding the US and Japan, most of the countries have only 40-50% of their users having access to Internet and data regularly. Think about the exponential growth!
Firstly, let us understand the difference between Data Backup and Data Archive.
- Data Archiving, often referred to as Data Tiering, protects older data that is not needed for the everyday operations of an organization
- A data archiving strategy reduces primary storage?and allows an organization to maintain data that may be required for regulatory or other needs.
- It is intended to protect older information that is not needed for everyday operations but may have to be accessed occasionally, allowing users to quickly access data easy retrieval
- Backups - The original data remains in place, while a backup copy is stored in another location
- Archival - Archived data is moved from its original location to an archive storage location
- Backups - Backed up data is constantly changing
- Archival - Once you create an archive, you do not modify it
- Backups - You periodically delete or overwrite data backups that are too old to be useful
- Archival - Data archives are designed for long-term storage
- Backups - Hot cloud storage or easily accessible local storage locations
- Archival - Cold cloud storage or tape archives
- Backups - All data, except for unimportant information like temporary files
- Archival - Specific data that need retention for compliance purposes
- Backups - With a backup, speed is an important attribute. Backups are often made regularly to keep them current; it is essential that they can be completed quickly.
- Archival - While speed is an equally important aspect for Archive as well, the retrieval is more business logic oriented compared to a backup
- Backups - A backup is not so equipped, neither the purpose for sophisticated search functionalities that an archive should have.
- Archival - Searchability stands as one of the major KPIs for an archive, making the business/audit requirement the centre stage for data retrieval capability
Now, let us see some of the benefits of data archival.
Data Archival Benefits
- Increased capacity: Archiving digital data ensures backup and recovery runs faster.
- Easier backup: Data archiving techniques can also ensure simpler backup processes because you don’t waste time backing up inactive data.
- Improved ability to meet compliance requirements: Regardless of your industry or vertical, data archiving requirements and best practices can ensure your organization stays in compliance with applicable regulations and the law.
- Enhanced productivity: Spend less time maintaining and managing software and infrastructure for on-site backup storage.
- Higher growth: A scalable, cost-effective cloud data archiving solution allows for a pay-as-you-go growth mode without as much waste, even in industries that generate high amounts of data.
- More refined management of locations: Using a virtual data archiving system allows for savings on investments into office intranets and other costly infrastructure.
Let us see types of archival types. Basic Data Archiving techniques can be categorized into 2 types
Structured Data Archiving
- Structured data archiving is moving data from custom-provided or commercially provided applications to an alternate file system or database management system (DBMS) while maintaining data access and referential integrity. Reducing the volume of data in production instances can improve performance and shrink batch windows. It can also reduce storage acquisition costs, facility requirements, environmental footprints and the cost of preserving data for compliance when retiring applications.
Unstructured Data Archiving
- Unstructured data is information, in many different forms, that doesn't follow a conventional data model or schema. It can be textual or non-textual, human-generated or machine-generated. Word documents, emails, messages, PowerPoint presentations, survey responses, transcripts of call centre interactions, posts from blogs and social media sites, images, audio and video files are some of the examples of unstructured data.
Now, how can we form an archival strategy?
Archival Strategy Steps
- Inventorying and determining which data must be archived Inventory and decide which data gets archived, considering what type of data must remain stored on a searchable archive database
- Assigning retention schedule based on compliance regulations: Industries such as Healthcare, Legal, or Government have stiff government regulations about proper storage of specific data. Mandated regulations require organizations to store sensitive data for a duration of time based on government guidelines.
- Develop an all-inclusive archive policy: The policy includes a formalized and all-inclusive set of procedures and rules which includes 1) Duration of data storage, 2) Benchmark for archiving data, 3) Variety of media used to store data, 4) Mechanisms that make the data filing process easier, 5) Rules for whom should have access to the archived files, 6) Tenets as to what circumstances are they allowed access
- Proactive protection of the data archive’s integrity: Aggressive security for the archived data must remain in place to handle sensitive information.
- Assigning Retention schedule and Legal Hold Flag based on compliance regulations: Industries such as Healthcare, Legal, or Government have stiff government regulations about proper storage of specific data. Mandated regulations require organizations to store sensitive data for a duration of time based on government guidelines.
- Finally choosing a data archive product/software (checklist): Some of the key features to conclude includes 1) Search and Discovery: Must contain an orderly and flexible search engine to recall archived data, 2) Multi-Platform Support: Should work seamlessly with popular platforms and applications, 3) Data Deduplication Engine: Will track duplicated data. Replaces it with a reference point to the original information, 4) Automated Backup: Will provide automation capacity that guarantees data remains archived according to policy and no data is left behind.
Once you decide on these, run the data archival process as an important program to reap the benefits. Business functionality, domain and regulatory aspects needs to be taken care of as well.
"Specifically, in the software industry, progress is highly sequential: progress is typically made through a large number of small steps, each building on the previous ones"
Eric Maskin -?American economist and 2007 Nobel laureate
I hope you enjoyed this as much as I did writing it. I am all ears to hear from you. Caring is sharing. Feel free to like, share or comment on what you think! Please tag me if you forward this for relevance.
Credits: The header and most of the images are designed using Canva. All other linked quotes and images are available freely on the Public Internet. Respective trademarks owned by corresponding firms. Quotes are freely available on the Internet. Opinions highlighted are from a personal experience standpoint and in no way reflect the views of my current or past employers or clients.
#WhatInspiresMe #TIDES #KRPoints #inspiration #motivation #ApplicationArchival #DataArchival #ApplicationModernization #data #statistics #bigdata?#programming #learning
Teacher of English
3 年Interesting job ??
Administrative Assistant at Cisco
3 年Agi
Advance Excel//Data Analyst//MySQL//Python//Power BI//SSIS/SSAS/SSRS//Team Lead
3 年Dear Team, I am looking job in Data Analyst/Senior Data Analyst kindly support me..