Cloud Data Warehousing—So What Is New?
I guess you’ve got the message by now! There are lots of aspects of cloud data warehousing that carry over directly and without change from “traditional” data warehousing: the purpose and principles, the conceptual architecture, and the identification of the three+ domains of data and information. So, what is new that merited a book on the topic? As seen in “Cloud Data Warehousing—Volume I: Architecting Data Warehouse, Lakehouse, Mesh, and Fabric” (available now), the differences appear at the level of the logical architecture and technology. As you might expect.
A logical architecture is a high-level, functional view of what IT must design and build to meet the business needs expressed in the conceptual architecture, taking into account the limitations of current technologies and the expectations of what they might deliver in the short to medium term. And what the logical architecture for cloud data warehousing must consider is, well, that fuzzy word preceding data warehousing: yes, cloud, and the distributed nature of the concept, as well as the emerging technologies seen in cloud computing.
In the “traditional” data warehousing logical architecture, the three data/information domains shown in my previous article become three separate pillars (representing the different technological bases needed) united by shared context-setting information (CSI).
In comparison, the logical information architecture for cloud pictured above shows two key differences. The first is the introduction of planes suggesting that a set of pillars can exist independently in multiple cloud environments, as well as on premises. CSI does, of course, need to virtually span these planes, as it does the pillars within any environment. This significance is not that there are multiple planes, but rather that the pillars have substantially the same meanings in the cloud as on premises.
领英推荐
The second difference is more impactful. The lower parts of the pillars are now conjoined. This arises from a significant change in technology. Object stores and open-source componentry are used in cloud data warehousing as the underlying storage substrate beneath different data management technologies, such as relational databases and other tools. The consequence is that the selfsame data can be used and reused by different types of processing technologies, a key foundation for the data lakehouse pattern.
The fact that the pillars remain separate higher up reflects my belief that different technologies will continue to have pros and cons for varied types of processing for the foreseeable future. A good example of this is graph databases. Despite the relational adjective, relational databases don’t manage relationships very well. The “relation” in relational databases comes from the mathematical concept of a relation or set. The relationships represented in graph theory and databases are conceptually different and are fundamental to building the complex structures of networks of related nodes of all sorts. Of particular interest in cloud data warehousing are the ontologies and inter-relationships between people, process, and information in CSI. I suggest this is an area where we’ll be seeing significant advances in the coming years.
In this article and the last, I’ve slipped into architectural considerations perhaps a little to deeply for some folks. For which I apologize! In the next post of this series, I will return to the lakehouse, mesh, and fabric patterns, and discuss how they differ based on the concepts discussed above at an architectural level.