How We Reimagined Data Storage
Before starting Datera in 2013, we had contributed the block storage subsystem to Linux (“Linux-IO”), which was adopted by the likes of Google, Red Hat (now IBM), Pure Storage, and many other array vendors. Linux-IO eventually became an industry standard, and emerged as an essential ingredient to software-defined IT.
While the industry was quick to take advantage of our open source software to replace proprietary storage hardware with less-proprietary storage hardware (and called it “software-defined”), we thought they missed the point: “Cloud” is not a location, it is an architecture and operating model, and mapping the rigid architecture of the past into software cannot deliver the flexibility, simplicity and economics of that model.
To provide a data foundation for cloud computing, we envisioned a service-centric model that could orchestrate data anywhere, while scaling continuous availability, predictable performance and management policy - across private and public clouds. If we could harness cloud computing for data to let users change their intent as they go, we could free them from trying to anticipate future storage needs. That is the genesis of the Datera data services platform, a new approach to data for the cloud era.
Why hyperscale?
Swarms are adaptive, monoliths are not. A swarm of starlings is infinitely adaptable and resilient, dinosaurs are the antithesis of adaptive – and extinct. If implemented correctly, everything else flows from this basic concept.
Google adopted swarm design almost twenty years ago. They build their data centers from thousands of commodity servers, and use autonomous distributed software to orchestrate them into coherent swarms. Now known as “hyperscale,” this architecture is transforming how IT is designed and delivered, putting IT as we know it under existential pressure – the Jurassic IT Era is coming to its end.
The promise of hyperscale is to converge diverse hardware resources into one coherent swarm with incredible adaptability and scalability. Its implementation, however, entails hard challenges like node heterogeneity, data gravity, data consistency, combinatorial reliability and availability, operational complexity, performance assurance, scalability cliffs, and so on, not to mention fundamental physical realities like time, distance and latency.
To pursue our vision, we assembled an interdisciplinary team around our founding architects Nicholas Bellinger (a Linux storage leader), Claudio Fleiner (a hyperscale wizard), Raghu Krishnamurthy (an automation thought leader) and Bill Rozas (a brilliant computer architect). Together, they made hyperscale storage work, and created the first enterprise tier-1 software-defined storage with a cloud operating model.
True software-defined storage
In this blog, I'll explain Datera’s architecture tenets in three categories: infrastructure model, automation model and hybrid cloud model, and how they confluence to transform the traditional IT operating model from system-defined to service-defined, thereby capturing significantly more value across the entire data and system life cycle.
1. Infrastructure model
Our key infrastructure tenet was to make hyperscale frictionless. What good is hyperscale if it can’t adapt rapidly and continuously, deliver predictable high performance, scale seamlessly, rebalance quickly among a rich spectrum of endpoints, and so on?
So, at the inception of Datera, we spent significant time rethinking the innate friction points of hyperscale. As a result, we created an infrastructure model that delivers:
- Enterprise performance: transparent data mobility and lockless distributed coherence confluence to frictionless I/O, driving predictable enterprise tier-1 performance, low latency and seamless scalability
- Continuous availability: built for change, transparent data mobility across generations of endpoints driven by current and future application intent, no forklift upgrades
- Eternal clusters: endpoint choice and heterogeneous scalability, no hardware lock-in, no technical debt, the hardware you start with no longer is the hardware you’re stuck with - always the best economics
- Fully programmable: all data infrastructure is accessible through a REST API, making it browsable, searchable and meshable, data infrastructure-as-code
- Ecosystem integration: standard protocols in extensible containers, forward agility, easily surfacing the benefits of new tech to all applications
The result is a uniquely flexible data services platform that can independently orchestrate data services and data across all of its endpoints. Live data services mobility and live data mobility lay the foundation to achieve data freedom across private and public clouds.
2. Automation model
Now that we created an infrastructure model that makes hyperscale frictionless and truly scalable, effectively creating long invisible arms across the datacenter, we turned to our next key tenet: automation. What good is frictionless hyperscale if it requires slow humans to work it?
So our hyperscale automation model is driven by applications - effortless, instant, invisible. Applications know best what they want:
- Application-driven automation: data infrastructure is continuously composed and delivered as a service, driven by applications and not by humans, based on application profiles (or intent) that you can set-and-forget. Intent is invariant and, together with transparent data mobility, makes data portable and scalable - across heterogeneous endpoints, technology and innovation. Most importantly, Datera allows you to change your mind as you go - it always adapts and gives you an operational out
- Data Orchestration: AI-driven continuous self-optimization and self-healing, based on application intent, infrastructure feedback (or insights) and live data mobility, 24x7 lights-out operations
- Role-based multi-tenancy: application intent can be mapped dynamically, depending on how it is instantiated - so tenants get a degree of self-service, while operating within pre-defined business envelopes
- Lifecycle management: manage full lifecycle of data and endpoints, from sunrise (e.g., test) to sunset (e.g., archive), always best data economics
- Global service management: global ops portal with visual insights and cloud-based machine learning across the installed base, driving anomaly detection and predictive operations
Containers further escalate the speed, scale and elasticity pressure on infrastructure - they consume infrastructure as a service that is continuously delivered. In that model, Kubernetes, Mesos and their brethren replace manual operations with application-driven automation of compute swarms - and Datera is to data as Kubernetes is to compute.The impact of this new automation model is hard to imagine without experiencing it.
Remember when the iPhone overnight made flip phones feel so 1990s? Experiencing Datera is a similar watershed moment - it makes traditional storage feel antiquated. What made the key difference for the iPhone? Multi-touch and apps. The iPhone is driven by apps, not by humans using a dialpad, just like Datera is driven by applications, not by humans using a keyboard.
3. Hybrid cloud model
Now that we have achieved data freedom across the data center, we can parlay it into a hybrid cloud model that lets us scale it across private and public clouds.
Our application-driven automation model allows describing the behavior of data in invariant application profiles, or storage blueprints, that we automatically adapt to tenancy and roles. Storage blueprints seamlessly expand the behavior of data beyond the box - they allow a “data broker” to make data portable, scalable and hybridizable.
Our complementary frictionless infrastructure model provides scalable live data mobility - effectively implementing a “data exchange” across private and public clouds.
- Data center automation: data center topology awareness to co-orchestrate data with the data center, e.g., optional L3 network virtualization to participate in flat L3 networks (one flat IP address space) instead of using L2 overlays, which streamlines network configuration and management, and makes data services continuously available by letting them float across the data center (behind fixed virtual IP addresses) with practically instant session failovers
- Storage blueprints: data packaged with behavior, active/passive data portability and scalability across a wide spectrum of diverse endpoints in private and public clouds
- Edge to cloud data freedom: data portability and scalability (storage blueprints), live data mobility (synchronous stretch clusters) and scale (lockless distributed coherence) allow active/active synchronous replication - essentially creating a single multi-cloud data continuum that allows applications to float from edge to cloud
As workloads are moving to the edge, the data center is evolving into a meta data center, and clouds get abstracted behind brokerage layers, we can provide the scaled data broker and data exchange to achieve data freedom from the intelligent edge to cloud.
4. Operating model
Our storage infrastructure and automation models allow decoupling storage consumption from deployment, continuously brokering between them, and independently scaling them. Now we can rethink the storage operating model from rigid point-in-time systems to continuously composable and scalable data services that empower both consumers and operators to optimize their own needs:
- Application owners: can self-service by simply defining composable and scalable data services as application intent – and change it as they go; and complementarily
- Storage operators: can independently scale performance and capacity by simply adding the best servers from the spot market – and vary them as they go.
We transform the storage operating model from system-defined to service-defined, which allows us to refactor the entire IT value creation chain, from planning through obsolescence, so that we can deliver transformational simplicity and efficiency:
This brings the cloud experience to enterprise data storage, and lets you free your mind (and data!) to focus on creating business value:
- Planning: plan your services not your systems, and change your plans as you go - Datera always adapts and gives you an operational out
- Procurement: free your mind from trying to plan ahead and getting it wrong, from fallible point in-time tech commitments that decay into technical/operational debt, from expensive under- or overprovisioning, and from trying to anticipate future IT needs in face of an unprecedented rate of innovation
- Operations: zero-touch continuous composability and delivery of data services. Free your mind from sisyphean repetitive manual configuration, aggravated by fast churning modern environments like containers or Kubernetes
- Scaling: an extensible data continuum that transparently scales across space and time - across new technologies, consumption models and hardware capabilities, and across accelerating innovation and obsolescence. Free your mind from hardware boundaries that create rigid silos with a never-ending treadmill of forklift upgrades
- Maintenance: continuous availability with a regular, planned maintenance cadence. Free your mind from unpredictable failure “cliffs” and stressful emergency incident responses
- Obsolescence: "eternal" clusters with live software upgrades on rolling endpoints. Free your mind from the remaining operational risk, including planning long obsolescence cycles, tech forklifts and data migration sprees
The result is a comprehensive data foundation for the modern software defined data center. Together with enterprise partners that have a global brand, reach and support, efficient supply chains and equipment financing, we can deliver game-changing operational value to enterprise customers - not just for hyperscale, but for any scale.
Scale different
Customers are looking to replatform their IT to cloud in order to increase business agility and reduce technology risk. Public clouds have enormous OpEx elasticity, which makes failure cheap and success expensive, and they lock customers in with captive data services. So there is a universal need for data services that converge public cloud simplicity and elasticity with private cloud control and efficiency, to create multi-cloud optionality.
To meet this demand, we reimagined storage from being system-defined to service-defined. We rethought storage to orchestrate data across application and technology churn, and across private and public clouds – driven by current and future application intent. We envisioned an "eternal" data services continuum that combines software-defined simplicity with enterprise 'abilities.
As a result, we created mission critical software-defined storage that is future-proof for the demands of digital transformation - a 24x7 lights-out data continuum that scales data freedom from the intelligent edge to cloud. Because the people who are crazy enough to think they can reimagine storage at scale, are the ones who do.
Please visit us at www.datera.io, or tweet me at @MarcFleischmann.
Principal Technologist
6 年Great piece Marc and congrats on all the great work you and your team at Datera have accomplished so far.