登录查看更多内容

Data Lake & Data Mesh

Raja Saurabh Tiwari

Vice President @ Citi | Java , Cloud, ML Solutions | Gen AI enthusiast | Wildlife Photography

发布日期: 2022年1月21日

Global data creation is projected to exceed 180 zettabytes in the next five years.?

It was always a struggle to create a single source of truth to analyze the data. May be having data centrally at one location can help us answer business questions quickly and easily.?Business Intelligence can give you deep insights to the data, but to get there you need a unified and standardized view of the data. This is where Data warehouse comes into rescue.?

Data warehouse can store huge amount of data from different sources and can solve the problem as long as the structure of the data is well defined.?

As the data is growing we have variety of sources generating the enterprise data. This data does not have well defined schema, it can be structured, semi-structured or unstructured. This poses a problem to the existing solutions we spoke There comes the data lake .

Data Lake

Data lake is a huge data storage having variety of data from different sources may be salesforce, IOT devices, Web, rest endpoint in any format may it be pictures, videos, XML'S, CSV's, JSON's or that matter any sort of data.?The Data Lake works on the concept of ‘store first and think later’ which makes it different from Data Warehouse. Other way to see this is as data lake is ELT and Data Warehouse is ETL. In Data Lake you store the data first, without too much thinking of the format and transformation and later based on the business needs you do the transformation.?

Since we are not following any standard schema in Data Lake the quality of the data is not great unlike with Data Warehouse.?Data Lake is built thinking about quantity whereas Data Warehouse is centered around quality.

With Data Lakes we create pipelines and bring all the data to the central data lake location. This can be combined with "Delta Lake" architecture to have different layers which would address problem rewinding the data failure.

So we solved the problem of huge storage having multi structured/un structured data. But that raises another problem :)

This approach of Data Lake takes, brings us few other major challenges :?

#1 : If you want to make data centralized, you?need to bring it from various resources and store it at one large storage location. Bringing all this data to a central location itself is a big and expensive task.
#2 :As the number of sources increase, the querying the central data store becomes slow, and it fails to scale.
#3 :When we are talking about data, moving it across different regions/countries can have impact from the data privacy standpoint.?

领英推荐

Modernize the Data Ecosystem to Lay the Foundation of…

ITC Infotech 1 年前

SEEOcta Data: Big Data – How Data can Generate Revenue

SEEBURGER 1 年前

Data Platform Week 2024

Dagster Labs 2 个月前

Data Mesh?

Global data creation is projected to exceed 180 zettabytes in the next five years.?It’s very difficult to imagine to have all the data stored at one location. Difficult to quickly process for needs and very costly to store it. Data Mesh coined by @Zhamak comes into the rescue. Data Mesh is the modern way of defining the distributed way of storing the data.?It makes data more accessible, secure, discoverable and interoperable.?

@Zhamak defines the 4 principles of the data mesh,

Domain-driven ownership : The first principle is about giving ownership of the data with domain teams. They should be responsible for data governance, who can access it and how the data should be accessed.
Data as a product : The domain teams would also be responsible for the products/views created out of the data. The domain team would be responsible for maintaining and updating the resulting data products.
Self-service infrastructure : The third principle talks about ease of using and maintaining the data products. For domain teams the infrastructure should be easy to use and maintain (using common tools and infrastructures).
Federated governance : Last but not the least, there needs to have a defined policy around accessibility and privacy of the data. This is around data governance, who can access the data, what can be accessed. This goes from schema, table to column and properties level. You can define different privileges, permissions and roles to achieve this.

With the principles explained above, we can address the issues posed by Data Lake architecture.

#1 :The data mesh defines a distributed approach towards data architecture. This means the ownership of the data is distributed and decentralized. Which makes respective teams to access the data quickly and easily.

#2 :With decentralized ownership the data is enabled to scale and respond to the business needs.?

#3 :With decentralized data ownership the individual domains are responsible for data security and quality.?

As data is growing exponentially, we need modern way of addressing the data storage, governance, security and getting meaningful insights to data with ease and quick way. Data Mesh is a great steps towards achieving that.

Thanks,

Raja Saurabh Tiwari

要查看或添加评论，请登录

Raja Saurabh Tiwari的更多文章

The Hidden Cost of AI

2025年3月1日

The Hidden Cost of AI

Artificial Intelligence (AI) is revolutionizing industries, enhancing automation, and creating new possibilities for…

3 条评论
Agentic AI - My take

2025年2月16日

Agentic AI - My take

Introduction In recent months, Agentic AI has emerged as a focal point in the technology sector, captivating both…

16 条评论
Large Language Models vs Small Language Models

2024年5月5日

Large Language Models vs Small Language Models

Before directly jumping to LLM, a quick recap on AI and Machine Learning. We all have been seeing the below image which…

2 条评论
So what makes a good data science profile

2022年4月19日

So what makes a good data science profile

Let's start with some stats Data science was named the fastest-growing job in 2017 by LinkedIn, and in 2018 Glassdoor…

3 条评论
Don't let your fear win

2022年4月17日

Don't let your fear win

Once Krishna and Balarama got late playing in the forest. They decided to rest in there over the night and thought to…

1 条评论
Analytics of Data Scientists in Kaggle

2021年2月14日

Analytics of Data Scientists in Kaggle

Kaggle has recently published a report on the Kaggle users on various aspects. The trend shows analysis of people…
Text Analysis - Word Cloud

2020年11月30日

Text Analysis - Word Cloud

Text Analysis : Text analysis one of the richest area in the Machine Learning space. Text analysis is the process of…
Machine Learning (Without CODE)

2020年10月30日

Machine Learning (Without CODE)

Machine learning is very fascinating for data science practitioners and everyone and there's a continuous effort…

2 条评论
Statistics vs. Visualization (#Data Science)

2020年10月24日

Statistics vs. Visualization (#Data Science)

Understanding the statistical properties of the data is one of the key aspect of data science or Machine Learning…
AutoML - first glance

2020年10月10日

AutoML - first glance

"Machine Learning and AI attempts to automate manual work..

See all articles

Data Lake & Data Mesh

Raja Saurabh Tiwari

Vice President @ Citi | Java , Cloud, ML Solutions | Gen AI enthusiast | Wildlife Photography

Data Lake

领英推荐

Data Mesh?

Raja Saurabh Tiwari的更多文章

社区洞察

其他会员也浏览了

Transforming Big Data Processing with Efficient Data Pipelines

Big Data & Data Lakes

#StridingTowardsTheIntelligentWorld-Big Data Applications Stride Towards Proactive and Intelligent Decision-Making

Learn how Lyftrondata Data Virtualization can enhance your data performance

Data Lake Market Analysis 2024-2033: Size, Trends, And Insights

Data Lakes, Time-Series Data, and Industrial Analytics

How MuleSoft Development Can Enhance Your Data Analytics Strategy?

Enterprise Data Lake Solutions - Continuum Innovations

Patrick Klingler Discussed Practicing Data Mesh in Enterprises

Data Modernization – What is the best route for your transformation journey? (Part 2)

Data Lake

领英推荐

Data Mesh?

Raja Saurabh Tiwari的更多文章

The Hidden Cost of AI

Agentic AI - My take

Large Language Models vs Small Language Models

So what makes a good data science profile

Don't let your fear win

Analytics of Data Scientists in Kaggle

Text Analysis - Word Cloud

Machine Learning (Without CODE)

Statistics vs. Visualization (#Data Science)

AutoML - first glance

社区洞察

其他会员也浏览了

Transforming Big Data Processing with Efficient Data Pipelines

Big Data & Data Lakes

#StridingTowardsTheIntelligentWorld-Big Data Applications Stride Towards Proactive and Intelligent Decision-Making

Learn how Lyftrondata Data Virtualization can enhance your data performance

Data Lake Market Analysis 2024-2033: Size, Trends, And Insights

Data Lakes, Time-Series Data, and Industrial Analytics

How MuleSoft Development Can Enhance Your Data Analytics Strategy?

Enterprise Data Lake Solutions - Continuum Innovations

Patrick Klingler Discussed Practicing Data Mesh in Enterprises

Data Modernization – What is the best route for your transformation journey? (Part 2)