The Rise of Data Stores & the Role of K8
As per the prominent?Digital Strategy by many research firms -SMAC (social, mobile, analytics and cloud), many organizations have moved or are moving to cloud and have benefitted with low admin & development overhead, lower cost and accelerated TTM. With Cloud PAAS indeed comes OOTB functionalities & services that significantly reduces the cost & development efforts. Most prominent Cloud vendors are AWS, Azure & GCP, each a great established platform. So far so good.
Along with Big Data, have evolved Big Technologies, Tools & Data Stores Landscape. With Technologies Landscape explosion, we are bombarded with numerous choices to choose from. In my earlier post, I have covered the essential principles for a robust Data Architecture. In any Data Architecture, choice of a Data Store is of paramount importance and it has become unprecedented challenge. Terms such as DB, DW, Data Mart, Data Hub, Data lake etc. are cliché now and need to be redefined in the Big Data world. Please see figure below, depicting how Data Stores landscape has evolved with every growing need:
Please don’t make the above diagram make you think that this is the end of it. It is only the beginning as the technologies & tools ecosystem is poised to evolve further with requirements and data complexity. And what is shown above is only Data Stores, technologies. There are separate Technologies & Tools evolving Landscape for Analytics, Visualization, Compute etc. With the growing & complex requirements, choice of Data Store is very critical as it is a backbone for Data Architecture. Criteria for choosing Data Store now are beyond storage types: Columnar, Key-Value, Document, Graph, Relational etc. For ex - many open source DBs have evolved to offer Polymorphic storage –both row & columnar (Single Store & Green Plum) or both graph & document oriented (Arango DB) etc. In summary, Data Stores technologies & tools are evolving in n number of dimensions: in-memory, graphs, storage types, portability, administration, workload types, semantic layer, metadata management, knowledge graphs, ML & AI support, SQL support, governance, document, multi-model, real time support, separation of compute & storage, data types, queries federation, data structures, consistency, replication, reference data, automation, HA, compression, cache etc. to cater to different types of use cases & requirements. Many open source DBs and other third party DBs are great choices indeed and even have edge over Data Stores provided as native services by the cloud vendors. In such cases, if you are moving to cloud, you are either coerced to choose Data Store offered by the cloud vendor as PaaS or use open-source DB/DWs as IaaS. With IaaS approach, disadvantages are high infra costs, maintenance overhead and risk of integration with other cloud native services. That’s not a levelled playfield in the sense that because you are on a particular cloud vendor platform, you don’t have the freedom to choose data store of choice that exactly matches your requirements. So, what’s the solution?
KUBERNETES/K8 comes to the rescue, an unsung Hero. So far Kubernetes has been perceived to be the compute orchestration framework, for ex- micro-services & spark jobs. However, with the evolution of Big Data Landscape, K8 has been gaining significant traction. Typical Big Data architecture has both compute and storage, so orchestration of compute only might offset the advantages as a whole because of the overhead of managing storage. With Kubernetes, you can use existing framework or build you own custom framework on top of any open source or 3rd party DB/DW to make it fully managed service – DBaaS. Not just you get PaaS features & advantages of reduced admin overhead & TTM, but it also brings infrastructure costs down significantly and makes your application portable (application can run anywhere on premise or on any cloud – no vendor lock in). No wonder that some vendors such as Single Store & Arango DB have made their own custom framework using K8 on top their data stores to offer DBaaS on any cloud or on-premises. Spot on - this is the way forward to provide vendor agnostic flexibility with absolutely no bounds in choosing tools, technologies, and hosting platform. Not surprisingly, many vendors such as Robin, Portworx, Magalix, Jovian, Robin etc. have also sprung up to offer open sources Data Stores as DBaaS offerings by using K8 based custom frameworks. Please see diagram below, depicting K8 Architecture:
Many frameworks & tools are built & evolving on top of Kubernetes to offer custom managed services - DBaaS: Agro for Workflow, EBS for Container Attached Storage, Tremolo for Authentication, Orchestra for DevOps, Kops, OpenShift & Rancher for Orchestration, Fluent Bit for Logs, GlusterFS for File System, Ceph for software defined storage, Velero for Backups, sysdig for Monitoring. And these Custom Frameworks are built on 2 core pillars of Kubernetes: StatefulSets & Operators.
领英推荐
StatefulSets:?StatefulSet?is the workload API object used to manage stateful applications. It manages the implementation and expansion of a set of Pods and provides guarantees on the order and uniqueness of these Pods. Using it, each of your pods is guaranteed the same network identity and disk across restarts, even if it's rescheduled to a different physical machine. The primary feature that enables StatefulSets to run a replicated database within Kubernetes is providing each pod a unique ID that persists, even as the pod is rescheduled to other machines. The persistence of this ID then lets you attach a particular volume to the pod, retaining its state even as Kubernetes shifts it around your datacenter.
Operators: Operators are software extensions to Kubernetes that make use of?custom resources?to manage applications and their components.?Operators helps automate deploying, running, scaling workloads, backups, integrity checks & other maintenance tasks. Kubernetes Operators make extensive use of Custom Resource Definitions (or?CRDs) to create context-specific entities and objects that will be accessed like any other Kubernetes API resource.
In summary, K8 based frameworks are evolving at a rapid pace to provide end to end stack for applications development & management. Yet different customization might be required for different technologies/tools as they have different underlying architecture. Please see generic logical architecture of DBaaS using K8:
Buckle up Datizens; Big Data GAME is changing, & I am betting on K8! ?What about You?
Business and IT Transformation Advisor | Enterprise Architect | SAP Center of Excellence | Integrate Sustainability into ERP Programs
3 年Thanks for sharing Abhishek on latest developments in this area.