The increasing mismatch between private and public clouds
Introduction
Microsoft announced today that they were removing their restrictions on storage on their Cloud, and this follows the trend for public cloud providers to provide large-scale storage of virtually ever digital entity that a user could ever gain through their lives. Unfortunately private clouds have a long way to go in matching this, and are often missing an opportunity to match user storage onto public cloud storage (such as mapping to an S3 storage bucket).
The main barriers have been performance and security, but Cloud front technology and an encryption layer solves these problems. So one must ask, why do I have to delete my university emails when I reach my 8GB limit, while there's an almost infinite storage in the Cloud? The answer is that company need to re-architecture, and re-design using hybrid clouds, where data can be backed-up and stored in public spaces, and where the data is still kept secure.
So my corporate email has not increased for about five years, but public cloud storage has almost doubled every year. As someone who also has over 1TB in both my Microsoft OneDrive and Dropbox, but only 8GB on my university account, it seems that my corporate storage systems are not keeping up with the latest trend in the scale-up in cloud-based storage. It should be remember that security in the Cloud is not really an issue, as it is possible to storage into a cloud-based data bucket, and encrypt the data. So as data storage capacity has increased 1000-fold over the last 10 years, my corporate storage has increased by a factor of eight.
Why don’t corporate systems keep up with the Cloud?
Corporations are still struggling with the Cloud, and knowing how they should use public and private cloud, and how to create a single entity which keeps some things locally, but can burst into public cloud spaces (Figure 1). The problems with the scalability have continually been around performance, resilience, and security.
- Security. Security can be solved by creating an encryption layer for all the data which leaves the corporate infrastructure and is then stored in a public cloud.
- Resilience. With resilience, the main cloud providers such as Amazon AWS and Microsoft Azure have shown a near 100% up-time over the past few years, with very few problem with outages. The last major outage for Amazon Web Services (AWS) happened in August 2013 for nearly an hour and was due to issues in their North Virginia datacenter. It mainly affected Amazon.com, but it’s effect on many companies who had built their own business in the Cloud, such as Vine and Instagram. It is estimated that Amazon lost as much as $1,100 in net sales per second (to put into context a five-minute outage in August 2013 cost Google $545,000).
- Performance. Performance has always been an issue, especially where network connections are slow, or become busy over certain time periods. This as an issue is reducing as high-speed network connections provide fast response rates, especially where the content is placed at the edge of the public cloud.
Figure 1: Public, private and hybrid clouds
What does a hybrid system look like?
If organisations want to use the Cloud for storage, there needs to be a strongly managed encryption layer, which has different polices and encryption requirements for different levels of risk (Figure 1). A high risk data will probably be kept on the site with physical access restrictions, but back-ups could be undertaken using strong policy enforcement, especially for access restrictions, and a strong management of the encryption process. Overall there are possible three main types of data which can be busted into the Cloud. It must be said that although data stays locally within the corporate infrastructure, there is no guarantee that it will not be actually secure. For the three main data streams, each with different encryption keys and different policies, the organisation can use a local data store to map to key parts of the data, and then burst into the Cloud, especially for long-term storage of files. For many, only use a few files every day, so they can be buffered on the local site. With Cloudfront technology the storage can be placed as near the organisation as possible. So the key aspect of this type of architecture is network connectivity ... as it needs to be fast and reliable, especially for external connections.
The key questions that each company should ask are:
- Can users access all the files that they require, with restrictions and within a defined quality of service?
- Can we recover all our files on the event of an outage?
- Can we recover the whole IT infastructure on a major outage?
With the increasing reliance on the Internet, companies might even consider backing-up to different cloud providers, in different regions, in order to full mitigate a full range of risks.
An excellent question for any forward looking company to ask is:
"If we had to move the whole IT infastructure to another place ... how long would it take?"
If the answer is less than one week, the organisation can probably cope with a major outage, otherwise it is at risk.
Figure 2: Hybrid storage
Life on a postage stamp
As public cloud storage has scaled-up, so has memory storage, where SanDisk have just created an SD card with 512 GB of memory, and it is expected that 2 TB will be achieved from the format.
Intel created the first DRAM (dynamic random-access memory) chip in 1971. It was named the 1103 and could hold 1kB of data. DRAM chips are made up of small capacitors which are charged up with electrical charge (for a binary 1), or discharged (for a binary 0). As they use the charging up and discharging of capacitors, they tended to be slower than the static version – SRAM (dynamic random-access memory), which toggle the state of a pair of transistors. A SRAM needs a larger space than DRAM, DRAM has often been used to create larger memory storage chips than the equivalent SRAM ones. Both SRAM and DRAM lose their contents when the power is taken away (volatile memory), so storage systems use nonvolatile memory to preserve the data when the power is taken away.
The IBM PC, released in 1981, only had around 1MB of memory and a 30MB hard disk. Now we are looking at 2TB on a card the size of a postage stamp (where the area is mainly taken up with the physical layout of the card and connector). One of the key growth areas of computing is likely to be in-memory computing, where it is possible to store all the data you need locally on memory, and have no real need for connections to the Internet. With a 2TB data storage, you could probably hold all the data you are ever going to need.
So let’s look at Bob’s footprint over his 80 years on the planet:
- Email. Bob sends 300 emails a day, and receives 400, each have 1000 characters (1B), so that’s 700KB each day, and 255MB a year. Then over 80 years this generates 20GB of emails. Footprint: 0.1%.
- Photos. I take five photos every day, each are 1.5MB. This 547MB each year, and over 80 years it creates 43GB. Footprint: 2.15%.
- Documents. Bob creates 10 documents each day, with an average of 1MB for document, which creates 3.6GB over a year, and 292GB over 80 years. Footprint: 14.6%.
- Bob loves Wikipedia, which takes around 10GB of space, and this would take Footprint: 0.05%.
- Social media. Bob wants to save ever post that he has made to Facebook, Twitter, and other social media sites. This is an average 30MB of data each day. This gives 10.9GB of data each year, and 876GB of data over 80 years. Footprint: 43.8%.
- Video. Bob takes a 5 videos each week, with a size of 100MB. That’s 5.2GB each year, and 416GB for 80 years. Footprint: 20.8%.
So that is 81% of the SD card used and we have stored the whole of Bob’s life … every email, photo, media post … in fact everything on the size of a postage stage.
Conclusions
For me 8GB is not really a great amount of data for emails, especially around the times when we submit research grants, where large files will move back and forward, but if I needed 100GB to store all my emails over a decade, I would still be using only 5% of my current Dropbox allocation. So, like it or not, we're moving to an age were we store every little thing about our lives, and the concept of deleting files to save space will go.