7 'data' words used on a daily basis defined:
Source: Data Warehouse and Data Vault Adoption Trends, Modeling, Modernization, and Automation - Authors: Kevin Petrie and Herbert Stauffer, Publication: April, 2023

7 'data' words used on a daily basis defined:

In our daily lives, we often use words that are so ingrained in our vocabulary that we rarely stop to consider their true meaning. These words, simple as they may seem, carry profound significance and can be the subject of extensive study and interpretation. They form the foundation of our communication and understanding, and yet, we might struggle to provide a comprehensive definition if asked.

In this exploration, we will delve into seven such words. Each of these terms is a universe unto itself, rich in nuance and depth. They are words that we may take for granted, but each could easily fill the pages of numerous books...

  • Data warehouse: The data warehouse is a central data repository that stores data in a predefined model for business intelligence. Analysts and managers use the data warehouse to gain a business view of data that support their decision making.
  • Data lake: The data lake is a repository that stores structured, semi-structured and unstructured data in its native format. Data lakes originated as on-premises repositories running on Apached Hadoop, then evolved to run in the cloud as object stores.
  • Data lakehouse: The data lakehouse combines elements of a data lake and a data warehouse in a hybrid repository. It applies SQL queries to cloud object stores to support business intelligence, data science, and self-service analytics.
  • Data vault: The data vault is an approach to data modeling, architecture, and methodology that adds to elements of Ralph Kimball’s star schema model and Bill Inmon’s third-normal form framework. Dan Linstedt and his team at Lockheed Martin created the data vault as a hybrid approach that stores all data, tracks history, and accommodates changing schemas and data containers.
  • Data mesh: The data mesh is a distributed data architecture in which business units own, manage, and publish data as a product for others to consume. Analysts and other data consumers use a self-service platform in a federated governance model.
  • Data fabric: The data fabric unifies data integration, preparation, cataloging, security, and discovery into a cohesive and automated process. It uses metadata, machine learning, and automation to combine data across formats and locations.
  • Data vault 2.0 solution: The data vault 2.0 solution incorporates people, process, and technology. It includes prescriptive methodologies and reference architectures for technologies such as the data warehouse, data lake, data lakehouse, virtualization, data fabric, and data mesh. The 2.0 methodology was founded on SEI’s Capability Maturity Model and derives from Six Sigma, total quality management, disciplined agile delivery, and lean.

What would you change to those definitions, should you keep the 4 lines constraint? Please comment.

#DataWarehouse #DataVault #DataVault2 #DataLake #DataLakehouse #DataFabric #DataMesh #DataModeling #DataArchitecture #BusinessIntelligence

要查看或添加评论,请登录

Thibaut De Vylder的更多文章

社区洞察

其他会员也浏览了