How to Create a Data Mesh Using Databricks Lakehouse Architecture
For years, online data storage used virtually the same, centralized approach, first with data warehouses and then data lakes. In recent years, however, the process has changed significantly and is vastly improved over the previous state of the industry.?
One such change is the introduction of Lakehouse architecture, by our partner, Databricks. Lakehouse architecture aims to revolutionize the way data is stored for the purposes of analytics. It can be used to create a data mesh, which will ensure that all stakeholders throughout an organization can access high-quality data and are responsible for the data they create.
First, let's explore what some of these terms mean and the difference between data lakes, lakehouses, and data warehouses, and then we'll look at how to use Databricks' groundbreaking Lakehouse architecture to simplify data analysis.
What Is a Data Mesh?
A data mesh is a data storage framework that considers each of a company’s various teams or departments (i.e., physical sales, online sales, legal, accounts receivable, accounts payable, etc.) as the owner of its own data product. This decentralized approach to data makes each “owner” responsible for the quality and accessibility of its data for other “customers,” those on other teams or in other departments who need to use this data.
What Are Data Lakes?
Data lakes are an improved version of a data warehouse in which information is stored in a raw format, which eliminates the need for first transforming the data. However, even though it is an improvement, there are still a few flaws to the lake approach. For instance, they're generally designed to be used by data scientists and other data professionals, which can make the information inaccessible to non-experts, putting it at odds with the goal of data democratization. Additionally, the quality of information that can be analyzed is generally lower and they are subject to poor performance.
What is Databricks' Lakehouse Architecture?
Databricks’ Lakehouse combines the best points of a data warehouse and a data lake to create the perfect information storage infrastructure. It is a cloud-based analytics, AI, and data platform that can be used to create data mesh architecture, making it easier for organizations to modernize their data storage approach.
The Four Principles
There are four guiding principles behind the idea of a data mesh: domain ownership, data as a product, self-service infrastructure platform, and federated governance.?
Domain ownership means that the data producers take ownership of their data product, ensuring its quality and accessibility to the other domains.
Data as a product means that the information collected by each domain is treated as the product of that domain; these domain owners are responsible for the quality and accessibility of their data.
领英推荐
Self-service infrastructure means that data analytics tools are standardized across domains so that they are accessible by those within and outside of each domain.
Federated governance means that all data conforms to the rules and requirements of the organization and the industry.
What is a Domain?
A domain is the basic building block of a data mesh and the Lakehouse Architecture serves a platform for all of an organization’s domains. The platform must be self-serve, allowing employees from other domain teams within the organization to easily access the data they need.
How to Use Lakehouse Architecture
The first step in building a data mesh with Lakehouse is to decide which type to use. If the data domains are separate and autonomy is desired, a harmonized mesh will be the best option. Databricks explains that harmonization can be achieved via, “platform blueprints, ensuring security and compliance” and “self-serve platform services (domain provisioning automation, data cataloging, metadata publishing, policies on data and compute resources).”
Alternatively, data mesh can take a a "Hub and Spokes" form. This is particularly useful if there is data that does not belong to one specific domain, as it can be organized as part of the hub, which can function as a domain of its own. In certain special cases, it may even be advantageous to utilize elements of both types of Data Mesh.?
Next comes the issue of figuring out whether it will be necessary to work between multiple companies or systems, especially if a company uses multiple cloud-based storage systems (especially if they're different providers) or separate legal entities, like an entrepreneur who owns multiple independent startups but wishes to combine data between the two for analytics purposes. With Databricks' Delta Sharing, this information can be combined across platforms and cloud providers while still maintaining security.
Finally, you are ready to get the Lakehouse platform put in place and create your data mesh. At this point you may want to consult external experts to assist you,if you have not been working with them throughout the process. It will involve a significant change to your current methods of storing and processing data, so you are likely to need assistance to get this set up correctly.
Once the system is set up and the mesh is in place, companies can begin to store and analyze their data more efficiently, saving both time and money even before the analysis takes place. With Delta Sharing, CEOs in charge of multiple companies can have all of their business information accessible via Lakehouse-facilitated data mesh, even if the companies use different cloud-based storage systems.
In order to achieve the results that you want from establishing data mesh architecture in your organization, invest in a high quality platform and the expertise that will implement it correctly.?
How To Take a Lakehouse Approach to Your Data Architecture
Implementing new data architecture in your organization is a worthwhile endeavor but involves significant time and effort. You will need stakeholder buy-in at all levels, training for all domain owners to understand how to take ownership of their data products, and the infrastructure to ensure the transition goes smoothly. To keep everything on track, it's best to work with experts. Square Peg Technologies’ team of consultants partner with Databricks to bring you the ideal data mesh solution, tailored for your organization. Contact us today to learn more.