Data Tiering

Data Tiering

Data Tiering

Data Tiering refers to a technique of moving less frequently used data, also known as cold data, to cheaper levels of storage or tiers. The term “data tiering” arose from moving data around different tiers or classes of storage within a storage system, but has expanded now to mean tiering or archiving data from a storage system to other clouds and storage systems. See also cloud tiering and choices for cloud data tiering.

Data Tiering Cuts Costs Because 70%+ of Data is Cold

As data grows, storage costs are escalating. It is easy to think the solution is more efficient storage. But the real cause of storage costs is poor data management. Over 70% of data is cold and has not been accessed in months, yet it sits on expensive storage and consumes the same backup resources as hot data. As a result, data storage costs are rising, backups are slow, recovery is unreliable, and the sheer bulk of this data makes it difficult to leverage new options like Flash and Cloud.

Data Tiering Was Initially Used within a Storage Array

Data Tiering was initially a technique used by storage systems to reduce the cost of data storage by tiering cold data within the storage array to cheaper but less performant options – for example, moving data that has not been touched in a year or more from an expensive Flash tier to a low-cost SATA disk tier.

Typical storage tiers within a storage array include:

  • Flash or SSD: A high-performance storage class but also very expensive. Flash is usually used on smaller data sets that are being actively used and require the highest performance.
  • SAS Disks: Usually the workhorse of a storage system, they are moderately good at performance but more expensive than SATA disks.
  • SATA Disks: Usually the lowest price-point for disks but not as performant as SAS disks.
  • Secondary Storage, often Object Storage: Usually a good choice for capacity storage – to store large volumes of cool data that is not as frequently accessed, at a much lower cost.

Cloud Data Tiering is now Popular

Increasingly, customers are looking at another option – tiering or archiving data to a public cloud.

  • Public Cloud Storage: Public clouds currently have a mix of object and file storage options. The object storage classes such as Amazon S3 and Azure Blob (Azure Storage) provide tremendous cost efficiency and all the benefits of object storage without the headaches of setup and management.

Tiering and archiving less frequently used data or cold data to public cloud storage classes is now more popular. This is because customers can leverage the lower cost storage classes within the cloud to keep the cold data and promote them to the higher cost storage classes when needed. For example, data can be archived or tiered from on-premises NAS to Amazon S3 Infrequent Access or Amazon Glacier for low ongoing costs, and then promoted to Amazon EFS or FSX when you want to operate on it and need performance.

But in order to get this level of flexibility, and to ensure you’re not treating the cloud as just a cheap storage locker, data that is tiered to the cloud needs to be accessible natively in the cloud without requiring third-party software. This requires file-tiering, not block-tiering.

Block Tiering Creates Unnecessary Costs and Lock-In

Block-level tiering was first introduced as a technique within a storage array to make the storage box more efficient by leveraging a mix of technologies such as more expensive SAS disks as well as cheaper SATA disks.

Block tiering breaks a file into various blocks – metadata blocks that contain information about the file, and data blocks that are chunks of the original file. Block-tiering or Block-level tiering moves less used cold blocks to lower, less expensive tiers, while hot blocks and metadata are typically retained in the higher, faster, and more expensive storage tiers.

Block tiering is a technique used within the storage operating system or filesystem and is proprietary. Storage vendors offer block tiering as a way to reduce the cost of their storage environment. Many storage vendors are now expanding block tiering to move data to the public cloud or on-premises object storage.

But, since block tiering (often called CloudPools – examples are NetApp FabricPool and Dell EMC Isilon CloudPools) is done inside the storage operating system as a proprietary solution, it has several limitations when it comes to efficiency of reuse and efficiency of storage savings. Firstly, with block tiering, the proprietary storage filesystem must be involved in all data access since it retains the metadata and has the “map” to putting the file together from the various blocks. This also means that the cold blocks that are moved to a lower tier or the cloud cannot be directly accessed from the new location without involving the proprietary filesystem because the cloud does not have the metadata map and the other data blocks and the file context and attributes to put the file together. So, block tiering is a proprietary approach that often results in unnecessary rehydration of the data and treats the cloud as a cheap storage locker rather than as a powerful way to use data when needed.

The only way to access data in the cloud is to run the proprietary storage filesystem in the cloud which adds to costs. Also, many third-party applications such as backup software that operate at a file level require the cold blocks to be brought back or rehydrated, which defeats the purpose of tiering to a lower cost storage and erodes the potential savings. For more details, read the white paper: Block vs. File-Level Tiering and Archiving.

Know Your Cloud Tiering Choices

File Tiering Maximizes Savings and Eliminates Lock-In

File-tiering is an advanced modern technology that uses standard protocols to move the entire file along with its metadata in a non-proprietary fashion to the secondary tier or cloud. File tiering is harder to build but better for customers because it eliminates vendor lock-in and maximizes savings. Whether files have POSIX-based Access Control Lists (ACLs) or NTFS extended attributes, all this metadata along with the file itself is fully tiered or archived to the secondary tier and stored in a non-proprietary format. This ensures that the entire data can be brought back as a file when needed. File tiering does not just move the file, but it also moves the attributes and security permissions and ACLS along with the file and maintains full file fidelity even when you are moving a file to a different storage architecture such as object storage or cloud. This ensures that applications and users can use the moved file from the original location, and they can directly open the file natively in the secondary location or cloud without requiring any third-party software or storage operating system.

Since file tiering maintains full file fidelity and native access based on standards at every tier, it also means that third party applications can access the moved data without requiring any agents or proprietary software. This ensures that savings are maximized since backup software and other third -arty applications can access moved data without rehydrating or bringing the file back to the original location. It also ensures that the cloud can be used to run valuable applications such as compliance search or big data analytics on the trove of tiered and archived data without requiring any third-party software or additional costs.

File-tiering is an advanced technique for archiving and cloud tiering that maximizes savings and breaks vendor lock-in.

Data Tiering Can Cut 70%+ Storage and Backup Costs When Done Right

In summary, data tiering is an efficient solution to cut storage and backup costs because it tiers or archives cold, unused files to a lower-cost storage class, either on-premises or in the cloud. However, to maximize the savings, data tiering needs to be done at the file level, not block level. Block-level tiering creates lock-in and erodes much of the cost savings because it requires unnecessary rehydration of the data. File tiering maximizes savings and preserves flexibility by enabling data to be used directly in the cloud without lock-in.

要查看或添加评论,请登录

Darshika Srivastava的更多文章

  • GEOSPATIAL ANALYTICS

    GEOSPATIAL ANALYTICS

    Geospatial Analytics Definition Geospatial analytics gathers, manipulates and displays geographic information system…

  • Analytics Engineering

    Analytics Engineering

    What is Analytics Engineering? As the sole analyst of a fast-growing Sydney startup, Claire experienced the pain of the…

  • DETERMINACY

    DETERMINACY

    A guide to Static Determinacy, Indeterminacy, and Instability Once the structure is completely modeled and the loads…

  • ICLOUD

    ICLOUD

    What is iCloud? Apple's free iCloud service stores subscribers' photos, videos, documents, apps and more and updates…

  • WIRE-FRAME

    WIRE-FRAME

    Wireframing is essential in UI Design A wireframe is a layout of a web page that demonstrates what interface elements…

  • BREADCRUMBS

    BREADCRUMBS

    What are Breadcrumbs? A breadcrumb is a secondary navigation aid that improves customer experience by helping users…

  • GENERATIVE ARTIFICIAL INTELLIGENCE

    GENERATIVE ARTIFICIAL INTELLIGENCE

    What is Generative AI? Generative AI refers to deep-learning models that can generate high-quality text, images, and…

  • REVENUE

    REVENUE

    What Is Revenue? Revenue is the money generated from normal business operations, calculated as the average sales price…

  • WPA

    WPA

    What Is Wi-Fi Protected Access? Wi-Fi Protected Access (WPA), Wi-Fi Protected Access 2 (WPA2), and Wi-Fi Protected…

  • CABLE-MODEM

    CABLE-MODEM

    What Is a Cable Modem? Cable modems are a prevalent type of hardware that connects computer devices with your ISP…

社区洞察

其他会员也浏览了