登录查看更多内容

The AI-Defined Data Center

Marc Fleischmann

SVP/GM at NetApp

发布日期: 2017年6月29日

As data centers are re-imagined for cloud, there’s a universal need for a data management platform that can orchestrate data everywhere, across private and public clouds. Accordingly, data management is evolving from a "storage-centric" model to a "service-centric" operations model, dynamically adapting resources to each individual application. In the process, data centers are making increasingly intelligent tradeoffs between application/business needs and infrastructure capabilities.

DATERA

Datera converges standard servers with mixed storage media into a single data platform, from which its AI tailors storage and data management individually to each application. Datera's data platform is architected from ground up to be operated as a service, and can continuously adapt to evolving business needs.

But what good would a smart data platform be if it couldn't handle the data itself efficiently? So Datera is also architected from ground up for very high performance.

As a result, Datera fundamentally innovates along two key dimensions:

AI-defined: Driven by policies, dynamically composable to application needs.
Low-latency: Built for NVMe, persistent memory, and Intel Optane and Skylake.

This unique combination makes Datera the foundation for the AI-defined datacenter, and gives customers game-changing operational efficiency and economics, combined with enterprise-class performance.

We now show Datera's performance, and how Datera's AI smartly delivers it. All benchmarks are 4k random read traffic (we'll get to writes in another blog), driven by rack-local iSCSI. Latencies are storage system service times (not including iSCSI itself).

SINGLE VOLUME

Let's start with a single volume (and application queue depth of 1) to determine the minimal latency for each type of volume/media.

Hybrid Flash Nodes (3,100 IOPS/volume at 160us - minimum latency)

Starting with a simple 3-node hybrid flash cluster and one single volume (combining NVMe flash with HDDs), we get 3,100 IOPS/volume at 160us (served from the NVMe tier).

Add an All-Flash Node

Now, let's add an all-flash node to the hybrid cluster - with one single click, we now have a data platform that combines hybrid flash nodes and an all-flash node.

Note that Datera's AI isn't yet moving the data itself, as the associated policy specifies an "economy" service level objective (SLO).

Flash-as-a-Service (2,500 IOPS/volume at 230us - minimum latency)

Now, let's change our policy from "economy" to "performance," and see how Datera's AI adjusts the data platform to deliver 2,500 IOPS/volume at 230us (from the all-flash node).

Datera's AI automatically live-migrates one copy onto the new all-flash node, together with its exports, so that applications can get a different SLO without service disruption. (With more all-flash nodes, it could also place all copies on flash, again depending on the application SLO.)

Why did the performance on an all-flash node actually decrease over the hybrid node? Because the all-flash uses SATA SSDs, while the hybrid node uses faster NVMe flash for its performance tier.

Why all-flash nodes if their performance is lower than hybrid? Because all-flash performance is consistent and predictable, while hybrid flash performance can fluctuate, based on whether the data is on disk or in flash.

Now, why Datera? Because Datera's AI uniquely places data on the storage nodes that best match the SLOs and economics of each individual application, it can spread copy data onto different nodes, based on desired failure behavior and economics, and it will seamlessly expand these concepts across private and public clouds.

Optane-as-a-Service (4,000 IOPS/volume at 70us - minimum latency)

Let's further expand the price/performance spectrum of our cluster by adding an Intel Optane node (one click). We already specified a "performance" SLO for our workload, so Datera's AI automatically live-migrates one data copy and its export from the all-flash node to the now better-fitting Optane node.

This marks the minimum media access latency of approximately 70us (4k reads). About 8us of that is Optane, about 40us is Datera's software stack, and the remaining 20us latency is the overhead of the external iSCSI protocol flow (the rack-local iSCSI pipe itself would add approximately another 20-25us).

MANY VOLUMES

Now that we've determined minimal distributed volume/media latencies, let's use a more realistic number of volumes (and an application queue depth of 32).

Optane-as-a-Service (230,000 IOPS/node at 90us latency)

Let's start with a mixed cluster with three hybrid flash nodes and one all-flash node. With a "performance" SLO, Datera's AI puts the primary copies on the all-flash node, and serves all reads from that node (barring datacenter topology considerations). We're getting 132,000 IOPS/node at 260us latency from the all-flash node.

Now, let's expand the price/performance spectrum of our mixed cluster by adding an Intel Optane node (one click). With the "performance" SLO, Datera's AI automatically live-migrates the corresponding data and exports from the all-flash node to the now better-fitting Optane node. As a result, the related application IOPS accelerate to 230,000 IOPS/node at 90us latency.

Flash-as-a-Service (132,000 IOPS/node at 260us latency)

Nodes may drop out anytime, and for any number of reasons. This may compress the price/performance spectrum of the data platform, and Datera's AI will adapt its layout correspondingly. Let's decommission the Intel Optane node, and keep the "performance" SLO for the related workloads.

As a result, Datera's AI live-migrates the associated exports from the Intel Optane node to the "next best" all-flash node. Correspondingly, workload performance decreases to 132,000 IOPS/node at 260us latency.

Basic Hybrid Cluster (160,000 IOPS/node at 190us latency)

Finally, let's remove the all-flash node, reverting back to a homogeneous hybrid flash cluster, but still keep our "performance" SLO for the associated workloads.

As a result, Datera's AI disperses the related exports across the hybrid cluster, while bringing the hot data from disk into the NVMe flash tier. During that time, performance can significantly fluctuate, with latencies up to multiple milliseconds. When the NVMe tier is fully heated up, the system settles at 160,000 IOPS/node at 190us latency.

SUMMARY

Perhaps simply think of Datera as the Tesla for the data center:

Tesla replaced the traditional combustion engine with superior technology. Datera replaced monolithic proprietary hardware with with intelligent software on commodity hardware. But in contrast to their modest electric or software-defined predecessors, both inspire with impressive performance.
Tesla's and Datera's raw speed is just the beginning - their AI creates a whole new experience. Tesla's AI creates a self-driving car, and its simplicity will help transforming transportation. Datera's AI creates a self-managing data platform that can orchestrate data everywhere, across any servers and media, or across private and public clouds, and its simplicity will help transforming how data centers are planned, procured, operated, serviced and scaled.

Welcome to the AI-defined data center. Welcome to the data era.

Please visit us at www.datera.io, or tweet me at @MarcFleischmann.

Aaron Reid

Principal Sales Engineer - Verge.io

7 年

Hi Marc Fleischmann great write up but this is a pretty recent article so with flash capacities going up quickly at a much cheaper price, why even use the hybrid model? TIA for your response.

1 次回应

Iben Rodriguez

Information Infrastructure Security Automation - Tools Integration - Management Consulting - Corp to Corp - Drift Detection - Policy and Standards

7 年

I've been getting this question a lot. How will our network and data center architecture change with these new application platforms like AI and machine learning becoming more popular? Thanks for posting this article.

1 次回应

Val Bercovici

Building AI Factories, Open Source & Cloud Native

7 年

Awesome vision. Look for a cross-link from an upcoming blog ??

2 次回应

查看更多评论

要查看或添加评论，请登录

Marc Fleischmann的更多文章

The SaaS Innovator's Dilemma - And How We're Addressing It

2023年7月24日

The SaaS Innovator's Dilemma - And How We're Addressing It

Hearing more and more concerning warstories about troubled #SaaS projects has motivated me to capture some of their…

10 条评论
Dear Datera

2020年4月15日

Dear Datera

Friends and colleagues, After almost seven amazing years, I’ve decided that it is time for me to say goodbye. I look…

134 条评论
Kubernetes for Data

2020年3月2日

Kubernetes for Data

Kubernetes (K8s) is revolutionizing how applications are distributed, operated and scaled. Its proliferation in the…
How We Reimagined Data Storage

2019年1月23日

How We Reimagined Data Storage

Before starting Datera in 2013, we had contributed the block storage subsystem to Linux (“Linux-IO”), which was adopted…

2 条评论
The Real Revolution Behind Self-Driving Cars is Self-Driving Infrastructure

2018年1月22日

The Real Revolution Behind Self-Driving Cars is Self-Driving Infrastructure

The hype around self-driving cars keeps reaching new heights, with most pundits focusing on their immediate innovation…

7 条评论

See all articles

The AI-Defined Data Center

Marc Fleischmann

SVP/GM at NetApp

DATERA

SINGLE VOLUME

Hybrid Flash Nodes (3,100 IOPS/volume at 160us - minimum latency)

Add an All-Flash Node

Flash-as-a-Service (2,500 IOPS/volume at 230us - minimum latency)

Optane-as-a-Service (4,000 IOPS/volume at 70us - minimum latency)

MANY VOLUMES

Optane-as-a-Service (230,000 IOPS/node at 90us latency)

Flash-as-a-Service (132,000 IOPS/node at 260us latency)

Basic Hybrid Cluster (160,000 IOPS/node at 190us latency)

SUMMARY

Marc Fleischmann的更多文章

社区洞察

其他会员也浏览了

Hammerspace March Newsletter

Securely Transferring Sensitive Data Between Clouds

January 2024 Hammerspace Newsletter

Mastering Multi-Cloud Data Management across OCI and GCP

The Role of Data Clouds in Managing and Scaling Big Data

What are Azure Arc-Enabled Data Services?

The Modern Data Ecosystem: Use Autoscaling

Data Nugget April 2024

ARTIFICIAL INTELLIGENCE STORAGE ARCHITECTURE

Using Azure Blob Storage for Large-Scale Data Storage

DATERA

SINGLE VOLUME

Hybrid Flash Nodes (3,100 IOPS/volume at 160us - minimum latency)

Add an All-Flash Node

Flash-as-a-Service (2,500 IOPS/volume at 230us - minimum latency)

Optane-as-a-Service (4,000 IOPS/volume at 70us - minimum latency)

MANY VOLUMES

Optane-as-a-Service (230,000 IOPS/node at 90us latency)

Flash-as-a-Service (132,000 IOPS/node at 260us latency)

Basic Hybrid Cluster (160,000 IOPS/node at 190us latency)

SUMMARY

Marc Fleischmann的更多文章

The SaaS Innovator's Dilemma - And How We're Addressing It

Dear Datera

Kubernetes for Data

How We Reimagined Data Storage

The Real Revolution Behind Self-Driving Cars is Self-Driving Infrastructure

社区洞察

其他会员也浏览了

Hammerspace March Newsletter

Securely Transferring Sensitive Data Between Clouds

January 2024 Hammerspace Newsletter

Mastering Multi-Cloud Data Management across OCI and GCP

The Role of Data Clouds in Managing and Scaling Big Data

What are Azure Arc-Enabled Data Services?

The Modern Data Ecosystem: Use Autoscaling

Data Nugget April 2024

ARTIFICIAL INTELLIGENCE STORAGE ARCHITECTURE

Using Azure Blob Storage for Large-Scale Data Storage