The AI-Defined Data Center
As data centers are re-imagined for cloud, there’s a universal need for a data management platform that can orchestrate data everywhere, across private and public clouds. Accordingly, data management is evolving from a "storage-centric" model to a "service-centric" operations model, dynamically adapting resources to each individual application. In the process, data centers are making increasingly intelligent tradeoffs between application/business needs and infrastructure capabilities.
DATERA
Datera converges standard servers with mixed storage media into a single data platform, from which its AI tailors storage and data management individually to each application. Datera's data platform is architected from ground up to be operated as a service, and can continuously adapt to evolving business needs.
But what good would a smart data platform be if it couldn't handle the data itself efficiently? So Datera is also architected from ground up for very high performance.
As a result, Datera fundamentally innovates along two key dimensions:
- AI-defined: Driven by policies, dynamically composable to application needs.
- Low-latency: Built for NVMe, persistent memory, and Intel Optane and Skylake.
This unique combination makes Datera the foundation for the AI-defined datacenter, and gives customers game-changing operational efficiency and economics, combined with enterprise-class performance.
We now show Datera's performance, and how Datera's AI smartly delivers it. All benchmarks are 4k random read traffic (we'll get to writes in another blog), driven by rack-local iSCSI. Latencies are storage system service times (not including iSCSI itself).
SINGLE VOLUME
Let's start with a single volume (and application queue depth of 1) to determine the minimal latency for each type of volume/media.
Hybrid Flash Nodes (3,100 IOPS/volume at 160us - minimum latency)
Starting with a simple 3-node hybrid flash cluster and one single volume (combining NVMe flash with HDDs), we get 3,100 IOPS/volume at 160us (served from the NVMe tier).
Add an All-Flash Node
Now, let's add an all-flash node to the hybrid cluster - with one single click, we now have a data platform that combines hybrid flash nodes and an all-flash node.
Note that Datera's AI isn't yet moving the data itself, as the associated policy specifies an "economy" service level objective (SLO).
Flash-as-a-Service (2,500 IOPS/volume at 230us - minimum latency)
Now, let's change our policy from "economy" to "performance," and see how Datera's AI adjusts the data platform to deliver 2,500 IOPS/volume at 230us (from the all-flash node).
Datera's AI automatically live-migrates one copy onto the new all-flash node, together with its exports, so that applications can get a different SLO without service disruption. (With more all-flash nodes, it could also place all copies on flash, again depending on the application SLO.)
Why did the performance on an all-flash node actually decrease over the hybrid node? Because the all-flash uses SATA SSDs, while the hybrid node uses faster NVMe flash for its performance tier.
Why all-flash nodes if their performance is lower than hybrid? Because all-flash performance is consistent and predictable, while hybrid flash performance can fluctuate, based on whether the data is on disk or in flash.
Now, why Datera? Because Datera's AI uniquely places data on the storage nodes that best match the SLOs and economics of each individual application, it can spread copy data onto different nodes, based on desired failure behavior and economics, and it will seamlessly expand these concepts across private and public clouds.
Optane-as-a-Service (4,000 IOPS/volume at 70us - minimum latency)
Let's further expand the price/performance spectrum of our cluster by adding an Intel Optane node (one click). We already specified a "performance" SLO for our workload, so Datera's AI automatically live-migrates one data copy and its export from the all-flash node to the now better-fitting Optane node.
This marks the minimum media access latency of approximately 70us (4k reads). About 8us of that is Optane, about 40us is Datera's software stack, and the remaining 20us latency is the overhead of the external iSCSI protocol flow (the rack-local iSCSI pipe itself would add approximately another 20-25us).
MANY VOLUMES
Now that we've determined minimal distributed volume/media latencies, let's use a more realistic number of volumes (and an application queue depth of 32).
Optane-as-a-Service (230,000 IOPS/node at 90us latency)
Let's start with a mixed cluster with three hybrid flash nodes and one all-flash node. With a "performance" SLO, Datera's AI puts the primary copies on the all-flash node, and serves all reads from that node (barring datacenter topology considerations). We're getting 132,000 IOPS/node at 260us latency from the all-flash node.
Now, let's expand the price/performance spectrum of our mixed cluster by adding an Intel Optane node (one click). With the "performance" SLO, Datera's AI automatically live-migrates the corresponding data and exports from the all-flash node to the now better-fitting Optane node. As a result, the related application IOPS accelerate to 230,000 IOPS/node at 90us latency.
Flash-as-a-Service (132,000 IOPS/node at 260us latency)
Nodes may drop out anytime, and for any number of reasons. This may compress the price/performance spectrum of the data platform, and Datera's AI will adapt its layout correspondingly. Let's decommission the Intel Optane node, and keep the "performance" SLO for the related workloads.
As a result, Datera's AI live-migrates the associated exports from the Intel Optane node to the "next best" all-flash node. Correspondingly, workload performance decreases to 132,000 IOPS/node at 260us latency.
Basic Hybrid Cluster (160,000 IOPS/node at 190us latency)
Finally, let's remove the all-flash node, reverting back to a homogeneous hybrid flash cluster, but still keep our "performance" SLO for the associated workloads.
As a result, Datera's AI disperses the related exports across the hybrid cluster, while bringing the hot data from disk into the NVMe flash tier. During that time, performance can significantly fluctuate, with latencies up to multiple milliseconds. When the NVMe tier is fully heated up, the system settles at 160,000 IOPS/node at 190us latency.
SUMMARY
Perhaps simply think of Datera as the Tesla for the data center:
- Tesla replaced the traditional combustion engine with superior technology. Datera replaced monolithic proprietary hardware with with intelligent software on commodity hardware. But in contrast to their modest electric or software-defined predecessors, both inspire with impressive performance.
- Tesla's and Datera's raw speed is just the beginning - their AI creates a whole new experience. Tesla's AI creates a self-driving car, and its simplicity will help transforming transportation. Datera's AI creates a self-managing data platform that can orchestrate data everywhere, across any servers and media, or across private and public clouds, and its simplicity will help transforming how data centers are planned, procured, operated, serviced and scaled.
Welcome to the AI-defined data center. Welcome to the data era.
Please visit us at www.datera.io, or tweet me at @MarcFleischmann.
Principal Sales Engineer - Verge.io
7 年Hi Marc Fleischmann great write up but this is a pretty recent article so with flash capacities going up quickly at a much cheaper price, why even use the hybrid model? TIA for your response.
Information Infrastructure Security Automation - Tools Integration - Management Consulting - Corp to Corp - Drift Detection - Policy and Standards
7 年I've been getting this question a lot. How will our network and data center architecture change with these new application platforms like AI and machine learning becoming more popular? Thanks for posting this article.
Building AI Factories, Open Source & Cloud Native
7 年Awesome vision. Look for a cross-link from an upcoming blog ??