登录查看更多内容

The Rise of Data Stores & the Role of K8

Abhishek Mittal

Sr. Director - Big Data, Analytics & Cloud

发布日期: 2021年9月3日

As per the prominent?Digital Strategy by many research firms -SMAC (social, mobile, analytics and cloud), many organizations have moved or are moving to cloud and have benefitted with low admin & development overhead, lower cost and accelerated TTM. With Cloud PAAS indeed comes OOTB functionalities & services that significantly reduces the cost & development efforts. Most prominent Cloud vendors are AWS, Azure & GCP, each a great established platform. So far so good.

Along with Big Data, have evolved Big Technologies, Tools & Data Stores Landscape. With Technologies Landscape explosion, we are bombarded with numerous choices to choose from. In my earlier post, I have covered the essential principles for a robust Data Architecture. In any Data Architecture, choice of a Data Store is of paramount importance and it has become unprecedented challenge. Terms such as DB, DW, Data Mart, Data Hub, Data lake etc. are cliché now and need to be redefined in the Big Data world. Please see figure below, depicting how Data Stores landscape has evolved with every growing need:

Please don’t make the above diagram make you think that this is the end of it. It is only the beginning as the technologies & tools ecosystem is poised to evolve further with requirements and data complexity. And what is shown above is only Data Stores, technologies. There are separate Technologies & Tools evolving Landscape for Analytics, Visualization, Compute etc. With the growing & complex requirements, choice of Data Store is very critical as it is a backbone for Data Architecture. Criteria for choosing Data Store now are beyond storage types: Columnar, Key-Value, Document, Graph, Relational etc. For ex - many open source DBs have evolved to offer Polymorphic storage –both row & columnar (Single Store & Green Plum) or both graph & document oriented (Arango DB) etc. In summary, Data Stores technologies & tools are evolving in n number of dimensions: in-memory, graphs, storage types, portability, administration, workload types, semantic layer, metadata management, knowledge graphs, ML & AI support, SQL support, governance, document, multi-model, real time support, separation of compute & storage, data types, queries federation, data structures, consistency, replication, reference data, automation, HA, compression, cache etc. to cater to different types of use cases & requirements. Many open source DBs and other third party DBs are great choices indeed and even have edge over Data Stores provided as native services by the cloud vendors. In such cases, if you are moving to cloud, you are either coerced to choose Data Store offered by the cloud vendor as PaaS or use open-source DB/DWs as IaaS. With IaaS approach, disadvantages are high infra costs, maintenance overhead and risk of integration with other cloud native services. That’s not a levelled playfield in the sense that because you are on a particular cloud vendor platform, you don’t have the freedom to choose data store of choice that exactly matches your requirements. So, what’s the solution?

KUBERNETES/K8 comes to the rescue, an unsung Hero. So far Kubernetes has been perceived to be the compute orchestration framework, for ex- micro-services & spark jobs. However, with the evolution of Big Data Landscape, K8 has been gaining significant traction. Typical Big Data architecture has both compute and storage, so orchestration of compute only might offset the advantages as a whole because of the overhead of managing storage. With Kubernetes, you can use existing framework or build you own custom framework on top of any open source or 3rd party DB/DW to make it fully managed service – DBaaS. Not just you get PaaS features & advantages of reduced admin overhead & TTM, but it also brings infrastructure costs down significantly and makes your application portable (application can run anywhere on premise or on any cloud – no vendor lock in). No wonder that some vendors such as Single Store & Arango DB have made their own custom framework using K8 on top their data stores to offer DBaaS on any cloud or on-premises. Spot on - this is the way forward to provide vendor agnostic flexibility with absolutely no bounds in choosing tools, technologies, and hosting platform. Not surprisingly, many vendors such as Robin, Portworx, Magalix, Jovian, Robin etc. have also sprung up to offer open sources Data Stores as DBaaS offerings by using K8 based custom frameworks. Please see diagram below, depicting K8 Architecture:

Many frameworks & tools are built & evolving on top of Kubernetes to offer custom managed services - DBaaS: Agro for Workflow, EBS for Container Attached Storage, Tremolo for Authentication, Orchestra for DevOps, Kops, OpenShift & Rancher for Orchestration, Fluent Bit for Logs, GlusterFS for File System, Ceph for software defined storage, Velero for Backups, sysdig for Monitoring. And these Custom Frameworks are built on 2 core pillars of Kubernetes: StatefulSets & Operators.

领英推荐

How to Build a Modern Data Stack in 2025

Leon Gordon 6 天前

Topic- The Top of the Best Practices to Implement in…

Databuzz Ltd 2 个月前

Understanding the Data Lakehouse Engine: Bridging the…

Birendra Kumar Sahu 5 个月前

StatefulSets:?StatefulSet?is the workload API object used to manage stateful applications. It manages the implementation and expansion of a set of Pods and provides guarantees on the order and uniqueness of these Pods. Using it, each of your pods is guaranteed the same network identity and disk across restarts, even if it's rescheduled to a different physical machine. The primary feature that enables StatefulSets to run a replicated database within Kubernetes is providing each pod a unique ID that persists, even as the pod is rescheduled to other machines. The persistence of this ID then lets you attach a particular volume to the pod, retaining its state even as Kubernetes shifts it around your datacenter.

Operators: Operators are software extensions to Kubernetes that make use of?custom resources?to manage applications and their components.?Operators helps automate deploying, running, scaling workloads, backups, integrity checks & other maintenance tasks. Kubernetes Operators make extensive use of Custom Resource Definitions (or?CRDs) to create context-specific entities and objects that will be accessed like any other Kubernetes API resource.

In summary, K8 based frameworks are evolving at a rapid pace to provide end to end stack for applications development & management. Yet different customization might be required for different technologies/tools as they have different underlying architecture. Please see generic logical architecture of DBaaS using K8:

Buckle up Datizens; Big Data GAME is changing, & I am betting on K8! ?What about You?

Avanish kumar

Business and IT Transformation Advisor | Enterprise Architect | SAP Center of Excellence | Integrate Sustainability into ERP Programs

3 年

Thanks for sharing Abhishek on latest developments in this area.

要查看或添加评论，请登录

Abhishek Mittal的更多文章

LLMOps

2024年9月22日

LLMOps

As organizations seek to leverage the power of LLMs in production environments, the need for efficient and scalable…

1 条评论
How to Evaluate Large Language Models (LLMs)

2024年9月22日

How to Evaluate Large Language Models (LLMs)

Large Language Models (LLMs) like GPT, Falcon, Gemini, BERT, Dolly etc have revolutionized the field of natural…
Retrieval-Augmented Generation (RAG) Techniques

2024年9月22日

Retrieval-Augmented Generation (RAG) Techniques

In the evolving field of artificial intelligence, the Retrieval-Augmented Generation (RAG) framework has emerged as a…
Rise of the LakeHouse Architecture

2023年10月28日

Rise of the LakeHouse Architecture

Modern Data Platforms have come a long way in trying to create a feasible Data Architecture. Initially it started with…
DataOps

2023年10月28日

DataOps

DataOps is an approach to data management that aims to combine agile methodologies, automation, and collaboration…
Data Quality & Data Observability

2023年10月28日

Data Quality & Data Observability

As the Data Lake grows in volumes, it poses significant challenges for data quality, as Data Lake often lack the…
Big Data calls for better Metadata Management: Knowledge Graphs

2021年9月8日

Big Data calls for better Metadata Management: Knowledge Graphs

All Datizens have reached to the consensus that management & administration of Big Data Lakes demand definition of…

2 条评论
Data 360

2021年9月4日

Data 360

Hi there! In the continuation of my Data Series, here I am again to wrap up the series with my masterpiece -Data/Info…

3 条评论
CRAZY Big Data

2021年8月30日

CRAZY Big Data

Hi Datizens! Here I am again to throw a Point of View on Big Data Architecture & Essentials. Now that we have learnt it…
Heard about 3 V’s of Data. What about 3 D’s of Data?

2021年8月27日

Heard about 3 V’s of Data. What about 3 D’s of Data?

How are you all Data Stalwarts doing? Having fun with Datalake or you think Datalake is not keeping its promise? I have…

2 条评论

See all articles

The Rise of Data Stores & the Role of K8

Abhishek Mittal

Sr. Director - Big Data, Analytics & Cloud

领英推荐

Abhishek Mittal的更多文章

社区洞察

其他会员也浏览了

Microsoft Fabric: Empowering Modern Data Analytics

Google Big Lake -Dataplex- Big Query - Changing the Data Paradigm in Multi-cloud world

Architecting Data Pipelines with Azure Data Lake and Azure Synapse

Taming the Data Chaos: Strategies to Clean Up and Thrive with Microsoft Fabric and Purview

Snowflake: Empowering Modern Data Analytics

The Evolution of the Data Lakehouse: From Warehouses and Lakes to Unified Data Systems

Unlocking the Data Lake: Strategies for Maximizing Value

Liberate Your Data Warehouse: Empowering Organizations with Data Mesh and Amazon Redshift

Making Data Lakehouse Real on?Azure

Unlocking Efficiency: Streamline Data Movement with Azure Data Factory

领英推荐

Abhishek Mittal的更多文章

LLMOps

How to Evaluate Large Language Models (LLMs)

Retrieval-Augmented Generation (RAG) Techniques

Rise of the LakeHouse Architecture

DataOps

Data Quality & Data Observability

Big Data calls for better Metadata Management: Knowledge Graphs

Data 360

CRAZY Big Data

Heard about 3 V’s of Data. What about 3 D’s of Data?

社区洞察

其他会员也浏览了

Microsoft Fabric: Empowering Modern Data Analytics

Google Big Lake -Dataplex- Big Query - Changing the Data Paradigm in Multi-cloud world

Architecting Data Pipelines with Azure Data Lake and Azure Synapse

Taming the Data Chaos: Strategies to Clean Up and Thrive with Microsoft Fabric and Purview

Snowflake: Empowering Modern Data Analytics

The Evolution of the Data Lakehouse: From Warehouses and Lakes to Unified Data Systems

Unlocking the Data Lake: Strategies for Maximizing Value

Liberate Your Data Warehouse: Empowering Organizations with Data Mesh and Amazon Redshift

Making Data Lakehouse Real on?Azure

Unlocking Efficiency: Streamline Data Movement with Azure Data Factory