Dealing with Legacy when moving to the cloud
Karl Davies-Barrett
Technology Thought Leader | Keynote speaker | Driving Innovation | Hybrid + Edge AI +5G
The cloud promises so many benefits for business and IT that it’s hard to avoid it’s instant allure. Perhaps its increasingly becoming more true what people say “It’s now a question of when and not why move to the cloud.†It’s also no doubt the market has matured, with frameworks from all the big cloud vendors readily available on how go about building a cloud migration strategy, readying your business in the cloud and operating and maintaining best practices. However one area rarely tackled is the how to move the cloud in tangible steps when it comes to dealing with “legacy†and an area where this becomes increasing interesting is with legacy data and/or backups. Of course any move to the cloud for a production system will consider new cloud backups via cloud storage going forward, but what about those backups you have accumulated on-prem for such systems or worse yet those that are no longer in use but must be kept for compliance reasons.
What is Legacy:
Without dwelling on this and focusing on the prize, what to move and how, first let’s see what defines legacy. Legacy data is enterprise essential information that is stored in an old or obsolete format or computer system and therefore difficult to access or process. In business terms, legacy can come about through company acquisitions or because of a change in vendor. In technology, it could be because vendors fail to support a particular file format anymore or hardware has gone out its support lifetime. Which ever way you look at it, a strategy or approach regarding how to handle it is not only important but necessary. New startups have little to no legacy, big old companies may have a ton, and others have either suffered or hopefully avoided it by having a proactive approach to managing it. This article is not about that (although it could certainly be applied), it’s about how you can handle that problem here and now with what you have while being excited about a move to the cloud.
Legacy Backups:
Last week I was approached a number of times by a partner to give guidance on how to move their customers to the cloud but w.r.t large amounts of backup data they had accumulated. With customer requirements and assets a bit vague we set out to provide some options and practical approaches to dealing with this situation. The backups used could range from quick incremental and short-lived to full long retention types that needed to be kept for years. So the issue is clear, how to treat the backups based on several factors: Retention Period vs Recovery time; Cost to maintain vs Risk of loss; Searchability vs Security and other factors.
The need for Data Governance:
There’s no avoiding it, any migration of data will always involve a governance exercise. How do you know what to move, to where and how if you don’t know the value of the data, risks and costs of storing / losing the data, how quickly it is needed and how long you need to keep it for. So the questions that flowed to the partner were some of the usual suspects:
· What is the retention period?
· What is the acceptable recovery time?
· What format/media is the backup stored in/on?
· What s/w and/or h/w is required to read and write the backup and how long will it remain in supported?
· Is the data encrypted and should it remain that way?
The devil is always in the details:
The previous questions focus more on the overall nature of the backup and can often be easily retrieved from your backup catalog, but the following questions dig deeper into what do you understand about your backups purpose and content. As a result the backup migration can be a simple retain (do nothing) approach, a re-host (Lift-and-shift), a re-platform (lift and shape) or even a retire (drop). But again even more questions need to be answered to choose the destiny of your legacy backups:
· Are we looking to maintain metadata like retention period and not reset it?
· Do you need to maintain ACL’s for granular user access security?
· How much of that data is redundant?
· Should it be searchable?
· Why is it being stored, for Backup / DR or compliance / archival purposes?
· How often am I likely to retrieve it and under what conditions?
Some solutions and guidance:
1. Dump what you don’t need
Anything with a short retention period, typically less than 90 days may not be worth moving to the cloud at all and so one approach would be to simply retain it on-prem and let it die naturally. The cost of preparation, moving the data and the time taken may not give the expected/acceptable return for say short term compliance data or log files. Analyze to see if you have any redundant data that you can drop or retire in the process. Not only will you save on storage and transfer costs but it will make finding your data easier and retrieval much faster.
2. Don’t become a hostage
Any backup with a recovery time of 0-48Hrs or a retention time of 91 days to 2 years may warrant a full service implementation to move it to the cloud in a format that maintains full metadata, must be searchable, preserves retention times but eliminates the risk of being held hostage to legacy s/w and h/w. Such a re-platform will typically involve:
· Ingesting – reading backup media headers to build a catalog of what data resides in which backups
· Reporting – Seeing what data is redundant and culling
· Indexing – Making your backups searchable
· Classify – Determine what your LTR policy looks like and what data can be left behind e.g
o Unique copy of everything /
o everything for the last x years /
o email and files for particular users
· Migrate – Move data off tapes and to the cloud
· Manage – Ensure there is a system in place to search data, ensure its integrity and retrieve it via the correct access privileges.
This approach will typically require a 3rd party vendor that can read the backup headers, stand up the restored backup files and ingest and catalog the files. All this ideally with the elimination of redundant data and preserving retention periods. Many of the bigger cloud enabled backup vendors support this and this may be the quickest and safest way to go, especially if you plan to use them for your future cloud systems backups anyway; having a single pane of glass across all your backup data will lower operational costs as well as other hidden costs.
3. The ad-hoc/hybrid approach
Some backups may not warrant a full service because their retention time is long term and of little real value except for compliance. There may also be the option of longer restoration times for such data that is not operationally critical and so an ad-hoc restore could be considered. This usually takes the form of having the data still catalogued somehow so that it can be found and retrieved, but its not readily available or retrievable in some automated fashion. One approach to this may be to convert move your legacy backup images from tape to the cloud and store them in blob storage to reduce tape storage risk and handling costs. The advantage is that its relatively quick, easy and cheap without the lengthy conversion step, but will still require you to maintain and mount the legacy s/w and h/w.
Of course you could also standup your backups and copy them as vhd’s inside cloud storage to make them accessible without the on-prem backup software, only paying for storage and compute for the time that you mount them to access for retrieval. This however will still require some form of catalog to be built and a system to manage retention periods.
Again, depending on the data, some chose to upload the files extracted from backups into blob storage directly, this can be cheap and affective, especially as they get tiered into cold and archive storage, but be warned, you will lose the ACL’s, retention periods and other metadata. It could however be a cost effective solution for low value data that is not often retrieved but must be maintained such as DB log files.
The economics of legacy migration:
Let’s say you have 6000 LTO tapes with a storage need of ~4PB. The first pass catalogue exercise reveals that we can eliminate 30% based on our filtering on the metadata due to them being past retention, incremental etc. That gets us to ~2.8PB. Now we load the tapes into the libraries, index and remove 95% of the duplicate data and filter on unique user files created/accessed for the past 2-3 years. Now our target migration set is approx. 200TB. All we need now s the manpower and/or automation to load and migrate the tapes in the time window of 6-9 months. After this tapes can be shredded and backup h/w and s/w retired.
The annual costs of this legacy could be close to $300k, half of which is likely to be tape management and costs of retrieval. True the manpower and migration effort could cost just as much but the running costs of storing that data in the cloud would be trivial compared, ~50k, giving rise to a 50% cost savings over 3 years.
Conclusion:
Of course there is no one-size-fits-all and you may chose to apply each approach to different parts of your legacy backup estate; a so called hybrid approach, dumping some data, leaving some to die, moving a majority to the cloud on a new platform/system while taking the risk with some backups to perform ad-hoc restores if and when the time arises. Some final considerations for any migration:
- What is the Cost – to transfer / for storage / for restoration / ongoing management
- Security – is my data integrity safe and only accessible to the right people
- How quickly and reliably can I get the data to the cloud
- Vendor reliability – what is the cost of being held hostage to a vendor propriety backup format, can I reduce that.
- Compliance – Should I be storing that data in the first place because of GDPR and am I allowed to move it to the cloud.
There is no one size fits and I don’t think there should be. All legacy data has its unique characteristics and value and I hope these pointers help in your next legacy migration to the cloud.