The Rise of Data Stores & the Role of K8

The Rise of Data Stores & the Role of K8

As per the prominent?Digital Strategy by many research firms -SMAC (social, mobile, analytics and cloud), many organizations have moved or are moving to cloud and have benefitted with low admin & development overhead, lower cost and accelerated TTM. With Cloud PAAS indeed comes OOTB functionalities & services that significantly reduces the cost & development efforts. Most prominent Cloud vendors are AWS, Azure & GCP, each a great established platform. So far so good.

Along with Big Data, have evolved Big Technologies, Tools & Data Stores Landscape. With Technologies Landscape explosion, we are bombarded with numerous choices to choose from. In my earlier post, I have covered the essential principles for a robust Data Architecture. In any Data Architecture, choice of a Data Store is of paramount importance and it has become unprecedented challenge. Terms such as DB, DW, Data Mart, Data Hub, Data lake etc. are cliché now and need to be redefined in the Big Data world. Please see figure below, depicting how Data Stores landscape has evolved with every growing need:

No alt text provided for this image

Please don’t make the above diagram make you think that this is the end of it. It is only the beginning as the technologies & tools ecosystem is poised to evolve further with requirements and data complexity. And what is shown above is only Data Stores, technologies. There are separate Technologies & Tools evolving Landscape for Analytics, Visualization, Compute etc. With the growing & complex requirements, choice of Data Store is very critical as it is a backbone for Data Architecture. Criteria for choosing Data Store now are beyond storage types: Columnar, Key-Value, Document, Graph, Relational etc. For ex - many open source DBs have evolved to offer Polymorphic storage –both row & columnar (Single Store & Green Plum) or both graph & document oriented (Arango DB) etc. In summary, Data Stores technologies & tools are evolving in n number of dimensions: in-memory, graphs, storage types, portability, administration, workload types, semantic layer, metadata management, knowledge graphs, ML & AI support, SQL support, governance, document, multi-model, real time support, separation of compute & storage, data types, queries federation, data structures, consistency, replication, reference data, automation, HA, compression, cache etc. to cater to different types of use cases & requirements. Many open source DBs and other third party DBs are great choices indeed and even have edge over Data Stores provided as native services by the cloud vendors. In such cases, if you are moving to cloud, you are either coerced to choose Data Store offered by the cloud vendor as PaaS or use open-source DB/DWs as IaaS. With IaaS approach, disadvantages are high infra costs, maintenance overhead and risk of integration with other cloud native services. That’s not a levelled playfield in the sense that because you are on a particular cloud vendor platform, you don’t have the freedom to choose data store of choice that exactly matches your requirements. So, what’s the solution?

KUBERNETES/K8 comes to the rescue, an unsung Hero. So far Kubernetes has been perceived to be the compute orchestration framework, for ex- micro-services & spark jobs. However, with the evolution of Big Data Landscape, K8 has been gaining significant traction. Typical Big Data architecture has both compute and storage, so orchestration of compute only might offset the advantages as a whole because of the overhead of managing storage. With Kubernetes, you can use existing framework or build you own custom framework on top of any open source or 3rd party DB/DW to make it fully managed serviceDBaaS. Not just you get PaaS features & advantages of reduced admin overhead & TTM, but it also brings infrastructure costs down significantly and makes your application portable (application can run anywhere on premise or on any cloud – no vendor lock in). No wonder that some vendors such as Single Store & Arango DB have made their own custom framework using K8 on top their data stores to offer DBaaS on any cloud or on-premises. Spot on - this is the way forward to provide vendor agnostic flexibility with absolutely no bounds in choosing tools, technologies, and hosting platform. Not surprisingly, many vendors such as Robin, Portworx, Magalix, Jovian, Robin etc. have also sprung up to offer open sources Data Stores as DBaaS offerings by using K8 based custom frameworks. Please see diagram below, depicting K8 Architecture:

No alt text provided for this image

Many frameworks & tools are built & evolving on top of Kubernetes to offer custom managed services - DBaaS: Agro for Workflow, EBS for Container Attached Storage, Tremolo for Authentication, Orchestra for DevOps, Kops, OpenShift & Rancher for Orchestration, Fluent Bit for Logs, GlusterFS for File System, Ceph for software defined storage, Velero for Backups, sysdig for Monitoring. And these Custom Frameworks are built on 2 core pillars of Kubernetes: StatefulSets & Operators.

StatefulSets:?StatefulSet?is the workload API object used to manage stateful applications. It manages the implementation and expansion of a set of Pods and provides guarantees on the order and uniqueness of these Pods. Using it, each of your pods is guaranteed the same network identity and disk across restarts, even if it's rescheduled to a different physical machine. The primary feature that enables StatefulSets to run a replicated database within Kubernetes is providing each pod a unique ID that persists, even as the pod is rescheduled to other machines. The persistence of this ID then lets you attach a particular volume to the pod, retaining its state even as Kubernetes shifts it around your datacenter.

Operators: Operators are software extensions to Kubernetes that make use of?custom resources?to manage applications and their components.?Operators helps automate deploying, running, scaling workloads, backups, integrity checks & other maintenance tasks. Kubernetes Operators make extensive use of Custom Resource Definitions (or?CRDs) to create context-specific entities and objects that will be accessed like any other Kubernetes API resource.

In summary, K8 based frameworks are evolving at a rapid pace to provide end to end stack for applications development & management. Yet different customization might be required for different technologies/tools as they have different underlying architecture. Please see generic logical architecture of DBaaS using K8:

No alt text provided for this image

Buckle up Datizens; Big Data GAME is changing, & I am betting on K8! ?What about You?


Avanish kumar

Business and IT Transformation Advisor | Enterprise Architect | SAP Center of Excellence | Integrate Sustainability into ERP Programs

3 年

Thanks for sharing Abhishek on latest developments in this area.

回复

要查看或添加评论,请登录

Abhishek Mittal的更多文章

  • LLMOps

    LLMOps

    As organizations seek to leverage the power of LLMs in production environments, the need for efficient and scalable…

    1 条评论
  • How to Evaluate Large Language Models (LLMs)

    How to Evaluate Large Language Models (LLMs)

    Large Language Models (LLMs) like GPT, Falcon, Gemini, BERT, Dolly etc have revolutionized the field of natural…

  • Retrieval-Augmented Generation (RAG) Techniques

    Retrieval-Augmented Generation (RAG) Techniques

    In the evolving field of artificial intelligence, the Retrieval-Augmented Generation (RAG) framework has emerged as a…

  • Rise of the LakeHouse Architecture

    Rise of the LakeHouse Architecture

    Modern Data Platforms have come a long way in trying to create a feasible Data Architecture. Initially it started with…

  • DataOps

    DataOps

    DataOps is an approach to data management that aims to combine agile methodologies, automation, and collaboration…

  • Data Quality & Data Observability

    Data Quality & Data Observability

    As the Data Lake grows in volumes, it poses significant challenges for data quality, as Data Lake often lack the…

  • Big Data calls for better Metadata Management: Knowledge Graphs

    Big Data calls for better Metadata Management: Knowledge Graphs

    All Datizens have reached to the consensus that management & administration of Big Data Lakes demand definition of…

    2 条评论
  • Data 360

    Data 360

    Hi there! In the continuation of my Data Series, here I am again to wrap up the series with my masterpiece -Data/Info…

    3 条评论
  • CRAZY Big Data

    CRAZY Big Data

    Hi Datizens! Here I am again to throw a Point of View on Big Data Architecture & Essentials. Now that we have learnt it…

  • Heard about 3 V’s of Data. What about 3 D’s of Data?

    Heard about 3 V’s of Data. What about 3 D’s of Data?

    How are you all Data Stalwarts doing? Having fun with Datalake or you think Datalake is not keeping its promise? I have…

    2 条评论

社区洞察

其他会员也浏览了