Data Lake & Data Mesh
Raja Saurabh Tiwari
Vice President @ Citi | Java , Cloud, ML Solutions | Gen AI enthusiast | Wildlife Photography
Global data creation is projected to exceed 180 zettabytes in the next five years.?
It was always a struggle to create a single source of truth to analyze the data. May be having data centrally at one location can help us answer business questions quickly and easily.?Business Intelligence can give you deep insights to the data, but to get there you need a unified and standardized view of the data. This is where Data warehouse comes into rescue.?
Data warehouse can store huge amount of data from different sources and can solve the problem as long as the structure of the data is well defined.?
As the data is growing we have variety of sources generating the enterprise data. This data does not have well defined schema, it can be structured, semi-structured or unstructured. This poses a problem to the existing solutions we spoke There comes the data lake .
Data Lake
Data lake is a huge data storage having variety of data from different sources may be salesforce, IOT devices, Web, rest endpoint in any format may it be pictures, videos, XML'S, CSV's, JSON's or that matter any sort of data.?The Data Lake works on the concept of ‘store first and think later’ which makes it different from Data Warehouse. Other way to see this is as data lake is ELT and Data Warehouse is ETL. In Data Lake you store the data first, without too much thinking of the format and transformation and later based on the business needs you do the transformation.?
Since we are not following any standard schema in Data Lake the quality of the data is not great unlike with Data Warehouse.?Data Lake is built thinking about quantity whereas Data Warehouse is centered around quality.
With Data Lakes we create pipelines and bring all the data to the central data lake location. This can be combined with "Delta Lake" architecture to have different layers which would address problem rewinding the data failure.
So we solved the problem of huge storage having multi structured/un structured data. But that raises another problem :)
This approach of Data Lake takes, brings us few other major challenges :?
领英推荐
Data Mesh?
Global data creation is projected to exceed 180 zettabytes in the next five years.?It’s very difficult to imagine to have all the data stored at one location. Difficult to quickly process for needs and very costly to store it. Data Mesh coined by @Zhamak comes into the rescue. Data Mesh is the modern way of defining the distributed way of storing the data.?It makes data more accessible, secure, discoverable and interoperable.?
@Zhamak defines the 4 principles of the data mesh,
With the principles explained above, we can address the issues posed by Data Lake architecture.
#1 :The data mesh defines a distributed approach towards data architecture. This means the ownership of the data is distributed and decentralized. Which makes respective teams to access the data quickly and easily.
#2 :With decentralized ownership the data is enabled to scale and respond to the business needs.?
#3 :With decentralized data ownership the individual domains are responsible for data security and quality.?
As data is growing exponentially, we need modern way of addressing the data storage, governance, security and getting meaningful insights to data with ease and quick way. Data Mesh is a great steps towards achieving that.
Thanks,
Raja Saurabh Tiwari