登录查看更多内容

How We Reimagined Data Storage

Marc Fleischmann

SVP/GM at NetApp

发布日期: 2019年1月23日

Before starting Datera in 2013, we had contributed the block storage subsystem to Linux (“Linux-IO”), which was adopted by the likes of Google, Red Hat (now IBM), Pure Storage, and many other array vendors. Linux-IO eventually became an industry standard, and emerged as an essential ingredient to software-defined IT.

While the industry was quick to take advantage of our open source software to replace proprietary storage hardware with less-proprietary storage hardware (and called it “software-defined”), we thought they missed the point: “Cloud” is not a location, it is an architecture and operating model, and mapping the rigid architecture of the past into software cannot deliver the flexibility, simplicity and economics of that model.

To provide a data foundation for cloud computing, we envisioned a service-centric model that could orchestrate data anywhere, while scaling continuous availability, predictable performance and management policy - across private and public clouds. If we could harness cloud computing for data to let users change their intent as they go, we could free them from trying to anticipate future storage needs. That is the genesis of the Datera data services platform, a new approach to data for the cloud era.

Why hyperscale?

Swarms are adaptive, monoliths are not. A swarm of starlings is infinitely adaptable and resilient, dinosaurs are the antithesis of adaptive – and extinct. If implemented correctly, everything else flows from this basic concept.

Google adopted swarm design almost twenty years ago. They build their data centers from thousands of commodity servers, and use autonomous distributed software to orchestrate them into coherent swarms. Now known as “hyperscale,” this architecture is transforming how IT is designed and delivered, putting IT as we know it under existential pressure – the Jurassic IT Era is coming to its end.

The promise of hyperscale is to converge diverse hardware resources into one coherent swarm with incredible adaptability and scalability. Its implementation, however, entails hard challenges like node heterogeneity, data gravity, data consistency, combinatorial reliability and availability, operational complexity, performance assurance, scalability cliffs, and so on, not to mention fundamental physical realities like time, distance and latency.

To pursue our vision, we assembled an interdisciplinary team around our founding architects Nicholas Bellinger (a Linux storage leader), Claudio Fleiner (a hyperscale wizard), Raghu Krishnamurthy (an automation thought leader) and Bill Rozas (a brilliant computer architect). Together, they made hyperscale storage work, and created the first enterprise tier-1 software-defined storage with a cloud operating model.

True software-defined storage

In this blog, I'll explain Datera’s architecture tenets in three categories: infrastructure model, automation model and hybrid cloud model, and how they confluence to transform the traditional IT operating model from system-defined to service-defined, thereby capturing significantly more value across the entire data and system life cycle.

1. Infrastructure model

Our key infrastructure tenet was to make hyperscale frictionless. What good is hyperscale if it can’t adapt rapidly and continuously, deliver predictable high performance, scale seamlessly, rebalance quickly among a rich spectrum of endpoints, and so on?

So, at the inception of Datera, we spent significant time rethinking the innate friction points of hyperscale. As a result, we created an infrastructure model that delivers:

Enterprise performance: transparent data mobility and lockless distributed coherence confluence to frictionless I/O, driving predictable enterprise tier-1 performance, low latency and seamless scalability
Continuous availability: built for change, transparent data mobility across generations of endpoints driven by current and future application intent, no forklift upgrades
Eternal clusters: endpoint choice and heterogeneous scalability, no hardware lock-in, no technical debt, the hardware you start with no longer is the hardware you’re stuck with - always the best economics
Fully programmable: all data infrastructure is accessible through a REST API, making it browsable, searchable and meshable, data infrastructure-as-code
Ecosystem integration: standard protocols in extensible containers, forward agility, easily surfacing the benefits of new tech to all applications

The result is a uniquely flexible data services platform that can independently orchestrate data services and data across all of its endpoints. Live data services mobility and live data mobility lay the foundation to achieve data freedom across private and public clouds.

2. Automation model

Now that we created an infrastructure model that makes hyperscale frictionless and truly scalable, effectively creating long invisible arms across the datacenter, we turned to our next key tenet: automation. What good is frictionless hyperscale if it requires slow humans to work it?

So our hyperscale automation model is driven by applications - effortless, instant, invisible. Applications know best what they want:

Application-driven automation: data infrastructure is continuously composed and delivered as a service, driven by applications and not by humans, based on application profiles (or intent) that you can set-and-forget. Intent is invariant and, together with transparent data mobility, makes data portable and scalable - across heterogeneous endpoints, technology and innovation. Most importantly, Datera allows you to change your mind as you go - it always adapts and gives you an operational out
Data Orchestration: AI-driven continuous self-optimization and self-healing, based on application intent, infrastructure feedback (or insights) and live data mobility, 24x7 lights-out operations
Role-based multi-tenancy: application intent can be mapped dynamically, depending on how it is instantiated - so tenants get a degree of self-service, while operating within pre-defined business envelopes
Lifecycle management: manage full lifecycle of data and endpoints, from sunrise (e.g., test) to sunset (e.g., archive), always best data economics
Global service management: global ops portal with visual insights and cloud-based machine learning across the installed base, driving anomaly detection and predictive operations

Containers further escalate the speed, scale and elasticity pressure on infrastructure - they consume infrastructure as a service that is continuously delivered. In that model, Kubernetes, Mesos and their brethren replace manual operations with application-driven automation of compute swarms - and Datera is to data as Kubernetes is to compute.The impact of this new automation model is hard to imagine without experiencing it.

Remember when the iPhone overnight made flip phones feel so 1990s? Experiencing Datera is a similar watershed moment - it makes traditional storage feel antiquated. What made the key difference for the iPhone? Multi-touch and apps. The iPhone is driven by apps, not by humans using a dialpad, just like Datera is driven by applications, not by humans using a keyboard.

3. Hybrid cloud model

Now that we have achieved data freedom across the data center, we can parlay it into a hybrid cloud model that lets us scale it across private and public clouds.

Our application-driven automation model allows describing the behavior of data in invariant application profiles, or storage blueprints, that we automatically adapt to tenancy and roles. Storage blueprints seamlessly expand the behavior of data beyond the box - they allow a “data broker” to make data portable, scalable and hybridizable.

Our complementary frictionless infrastructure model provides scalable live data mobility - effectively implementing a “data exchange” across private and public clouds.

Data center automation: data center topology awareness to co-orchestrate data with the data center, e.g., optional L3 network virtualization to participate in flat L3 networks (one flat IP address space) instead of using L2 overlays, which streamlines network configuration and management, and makes data services continuously available by letting them float across the data center (behind fixed virtual IP addresses) with practically instant session failovers
Storage blueprints: data packaged with behavior, active/passive data portability and scalability across a wide spectrum of diverse endpoints in private and public clouds
Edge to cloud data freedom: data portability and scalability (storage blueprints), live data mobility (synchronous stretch clusters) and scale (lockless distributed coherence) allow active/active synchronous replication - essentially creating a single multi-cloud data continuum that allows applications to float from edge to cloud

As workloads are moving to the edge, the data center is evolving into a meta data center, and clouds get abstracted behind brokerage layers, we can provide the scaled data broker and data exchange to achieve data freedom from the intelligent edge to cloud.

4. Operating model

Our storage infrastructure and automation models allow decoupling storage consumption from deployment, continuously brokering between them, and independently scaling them. Now we can rethink the storage operating model from rigid point-in-time systems to continuously composable and scalable data services that empower both consumers and operators to optimize their own needs:

Application owners: can self-service by simply defining composable and scalable data services as application intent – and change it as they go; and complementarily
Storage operators: can independently scale performance and capacity by simply adding the best servers from the spot market – and vary them as they go.

We transform the storage operating model from system-defined to service-defined, which allows us to refactor the entire IT value creation chain, from planning through obsolescence, so that we can deliver transformational simplicity and efficiency:

This brings the cloud experience to enterprise data storage, and lets you free your mind (and data!) to focus on creating business value:

Planning: plan your services not your systems, and change your plans as you go - Datera always adapts and gives you an operational out
Procurement: free your mind from trying to plan ahead and getting it wrong, from fallible point in-time tech commitments that decay into technical/operational debt, from expensive under- or overprovisioning, and from trying to anticipate future IT needs in face of an unprecedented rate of innovation
Operations: zero-touch continuous composability and delivery of data services. Free your mind from sisyphean repetitive manual configuration, aggravated by fast churning modern environments like containers or Kubernetes
Scaling: an extensible data continuum that transparently scales across space and time - across new technologies, consumption models and hardware capabilities, and across accelerating innovation and obsolescence. Free your mind from hardware boundaries that create rigid silos with a never-ending treadmill of forklift upgrades
Maintenance: continuous availability with a regular, planned maintenance cadence. Free your mind from unpredictable failure “cliffs” and stressful emergency incident responses
Obsolescence: "eternal" clusters with live software upgrades on rolling endpoints. Free your mind from the remaining operational risk, including planning long obsolescence cycles, tech forklifts and data migration sprees

The result is a comprehensive data foundation for the modern software defined data center. Together with enterprise partners that have a global brand, reach and support, efficient supply chains and equipment financing, we can deliver game-changing operational value to enterprise customers - not just for hyperscale, but for any scale.

Scale different

Customers are looking to replatform their IT to cloud in order to increase business agility and reduce technology risk. Public clouds have enormous OpEx elasticity, which makes failure cheap and success expensive, and they lock customers in with captive data services. So there is a universal need for data services that converge public cloud simplicity and elasticity with private cloud control and efficiency, to create multi-cloud optionality.

To meet this demand, we reimagined storage from being system-defined to service-defined. We rethought storage to orchestrate data across application and technology churn, and across private and public clouds – driven by current and future application intent. We envisioned an "eternal" data services continuum that combines software-defined simplicity with enterprise 'abilities.

As a result, we created mission critical software-defined storage that is future-proof for the demands of digital transformation - a 24x7 lights-out data continuum that scales data freedom from the intelligent edge to cloud. Because the people who are crazy enough to think they can reimagine storage at scale, are the ones who do.

Please visit us at www.datera.io, or tweet me at @MarcFleischmann.

Matthew O'Keefe, Ph.D.

Principal Technologist

6 年

Great piece Marc and congrats on all the great work you and your team at Datera have accomplished so far.

查看更多评论

要查看或添加评论，请登录

Marc Fleischmann的更多文章

The SaaS Innovator's Dilemma - And How We're Addressing It

2023年7月24日

The SaaS Innovator's Dilemma - And How We're Addressing It

Hearing more and more concerning warstories about troubled #SaaS projects has motivated me to capture some of their…

10 条评论
Dear Datera

2020年4月15日

Dear Datera

Friends and colleagues, After almost seven amazing years, I’ve decided that it is time for me to say goodbye. I look…

134 条评论
Kubernetes for Data

2020年3月2日

Kubernetes for Data

Kubernetes (K8s) is revolutionizing how applications are distributed, operated and scaled. Its proliferation in the…
The Real Revolution Behind Self-Driving Cars is Self-Driving Infrastructure

2018年1月22日

The Real Revolution Behind Self-Driving Cars is Self-Driving Infrastructure

The hype around self-driving cars keeps reaching new heights, with most pundits focusing on their immediate innovation…

7 条评论
The AI-Defined Data Center

2017年6月29日

The AI-Defined Data Center

As data centers are re-imagined for cloud, there’s a universal need for a data management platform that can orchestrate…

6 条评论

See all articles

社区洞察

Processors

How do you optimize processor performance and efficiency in a hybrid streaming-batch environment?

How We Reimagined Data Storage

Marc Fleischmann

SVP/GM at NetApp

Why hyperscale?

True software-defined storage

1. Infrastructure model

2. Automation model

3. Hybrid cloud model

4. Operating model

Scale different

Marc Fleischmann的更多文章

社区洞察

其他会员也浏览了

Cloud-Native Essentials: Abstracted Endpoints

Distributed Snapshots

Debugging Compute and Network Issues in Kafka

RAFT Algorithm: Consensus in Distributed Systems

Open the Door to openEuler

IBM Introduces its Diamondback LTO Tape Library

Kafka in Edge Computing

IBM details storage portfolio for AI infrastructure

Distributed System Design Patterns

IBM Storage's Spectrum Scale and Elastic Storage Server Provides the Ohio Supercomputer with 8.6 Petabytes for Their AI and Big Data Applications

Why hyperscale?

True software-defined storage

1. Infrastructure model

2. Automation model

3. Hybrid cloud model

4. Operating model

Scale different

Marc Fleischmann的更多文章

The SaaS Innovator's Dilemma - And How We're Addressing It

Dear Datera

Kubernetes for Data

The Real Revolution Behind Self-Driving Cars is Self-Driving Infrastructure

The AI-Defined Data Center

社区洞察

其他会员也浏览了

Cloud-Native Essentials: Abstracted Endpoints

Distributed Snapshots

Debugging Compute and Network Issues in Kafka

RAFT Algorithm: Consensus in Distributed Systems

Open the Door to openEuler

IBM Introduces its Diamondback LTO Tape Library

Kafka in Edge Computing

IBM details storage portfolio for AI infrastructure

Distributed System Design Patterns

IBM Storage's Spectrum Scale and Elastic Storage Server Provides the Ohio Supercomputer with 8.6 Petabytes for Their AI and Big Data Applications