Sustainable Data Architectures Through Data Architecture Automation
Countless organizations are developing new data platforms with modern architectures. Some are based on data warehouses, others on data lakes, data hubs or data lakehouses. Regardless of the type of architecture, today sustainability is the key requirement for each new data architecture. Organizations demand data platforms with sustainable data architectures.
A sustainable data architecture is not one that only supports current data requirements and all newly identified requirements, but one that is adaptable and extensible enough to support (yet unknown) future requirements. The latter is important to be able to deal with completely unexpected and unplanned changes that lead to new requirements. For example, an organization may acquire another company that operates in a different industry with totally different data requirements; or urgent market developments resulting from business opportunities, disasters, or new aggressive competitors, can also lead to new data requirements.
A sustainable data architecture is one that can survive for a long time, because it is easy to adapt and extend. A sustainable data architecture enables an organization to quickly implement current, known upcoming and yet unknown future data requirements. When data consumption requirements change, the data architecture adapts accordingly without the need for major reengineering and redevelopment exercises. Sustainable data architectures are nimble.
But this is all easier said than done. Data warehouse automation tools can come to the rescue. Basically, data warehouse automation tools are generators. Generators transform higher-level specifications into lower-level specifications executed by specific runtime technologies, such as compilers, database servers, messaging products, or ETL tools.
In the world of data warehouses and other data architectures, we have been using generators for a long time. However, most of those generators produce one component of an entire data platform, such as an application or database. For example, an ETL tool generates ETL programs, BI tools generate SQL statements, and data modeling tools generate data structures.
This means that multiple, independent generators are required to generate an entire data platform. Since the generators require similar specifications, they are defined multiple times, or in other words they are duplicated. The challenge is to keep all those specifications consistent, to make sure they work together optimally, and to guarantee that if one specification is changed, all the duplicate specifications are changed accordingly.
Many of the tasks involved in designing and developing data platforms are quite repetitive and formalizable. That makes these tasks suitable for generators. For example, when an enterprise data warehouse uses a data vault design technique and the physical data marts use star schemas, both can be generated from a central data model including the ETL code to copy the data from the warehouse to the data marts.
领英推荐
The principles that apply to generators of individual platform components can also be applied to generators of entire data architectures. This category of generators operating on the architectural level is called data warehouse automation tools. They do not generate code for one component of the architecture, but for several and sometimes for the entire architecture. Traditional data warehouse automation tools generate, for example, staging areas, enterprise data warehouses, physical data marts, the ETL solutions that copy data from one database to another, and metadata. Several of these tools have been on the market for many years and have proven their worth.
A limitation of various data warehouse automation tools is that they only generate data platforms with traditional data warehouse architectures. Such platforms are only suitable for a limited set of data consumption forms. In other words, they are single-purpose data platforms. That does not make them very sustainable.
To develop sustainable data architectures, generators are required that can generate other data architectures in addition to the more traditional data warehouse architectures, such as data lake and data hub architectures. Such generators can be used to generate data platforms for other forms of data consumption than those supported by data warehouse architectures. If architects want to replace physical data marts developed with SQL databases with virtual data marts implemented with SQL views, the generator should support this. Or, if the central data warehouse needs to be replaced with a more datahub-like solution, or the ETL-solution with a streaming solution, the generator should make this all possible by simply regenerating the platform.
These generators exist. For them the term data warehouse automation is probably a misnomer. Data architecture automation tool is more appropriate.
The whitepaper ‘Sustainable Data Architectures Using Data Warehouse Automation’ describes the need for such automation tools in more detail and explains how WhereScape qualifies as data architecture automation tool: https://www.wherescape.com/resources/whitepaper-sustainable-data-architectures-using-data-warehouse-automation/
?
?
Author of 'Enterprise Architecture Fundamentals', Founder & Owner of Caminao
3 年For enterprises immersed in competitive digital environments, "Sustainable Data Architectures" can be compared to dry docks. https://caminao.blog/enterprise-architecture-fundamentals-the-book/book-pick-data-information-knowledge/