Sustainable Data Architectures Through Data Architecture Automation

Sustainable Data Architectures Through Data Architecture Automation

Countless organizations are developing new data platforms with modern architectures. Some are based on data warehouses, others on data lakes, data hubs or data lakehouses. Regardless of the type of architecture, today sustainability is the key requirement for each new data architecture. Organizations demand data platforms with sustainable data architectures.

A sustainable data architecture is not one that only supports current data requirements and all newly identified requirements, but one that is adaptable and extensible enough to support (yet unknown) future requirements. The latter is important to be able to deal with completely unexpected and unplanned changes that lead to new requirements. For example, an organization may acquire another company that operates in a different industry with totally different data requirements; or urgent market developments resulting from business opportunities, disasters, or new aggressive competitors, can also lead to new data requirements.

A sustainable data architecture is one that can survive for a long time, because it is easy to adapt and extend. A sustainable data architecture enables an organization to quickly implement current, known upcoming and yet unknown future data requirements. When data consumption requirements change, the data architecture adapts accordingly without the need for major reengineering and redevelopment exercises. Sustainable data architectures are nimble.

But this is all easier said than done. Data warehouse automation tools can come to the rescue. Basically, data warehouse automation tools are generators. Generators transform higher-level specifications into lower-level specifications executed by specific runtime technologies, such as compilers, database servers, messaging products, or ETL tools.

In the world of data warehouses and other data architectures, we have been using generators for a long time. However, most of those generators produce one component of an entire data platform, such as an application or database. For example, an ETL tool generates ETL programs, BI tools generate SQL statements, and data modeling tools generate data structures.

This means that multiple, independent generators are required to generate an entire data platform. Since the generators require similar specifications, they are defined multiple times, or in other words they are duplicated. The challenge is to keep all those specifications consistent, to make sure they work together optimally, and to guarantee that if one specification is changed, all the duplicate specifications are changed accordingly.

Many of the tasks involved in designing and developing data platforms are quite repetitive and formalizable. That makes these tasks suitable for generators. For example, when an enterprise data warehouse uses a data vault design technique and the physical data marts use star schemas, both can be generated from a central data model including the ETL code to copy the data from the warehouse to the data marts.

The principles that apply to generators of individual platform components can also be applied to generators of entire data architectures. This category of generators operating on the architectural level is called data warehouse automation tools. They do not generate code for one component of the architecture, but for several and sometimes for the entire architecture. Traditional data warehouse automation tools generate, for example, staging areas, enterprise data warehouses, physical data marts, the ETL solutions that copy data from one database to another, and metadata. Several of these tools have been on the market for many years and have proven their worth.

A limitation of various data warehouse automation tools is that they only generate data platforms with traditional data warehouse architectures. Such platforms are only suitable for a limited set of data consumption forms. In other words, they are single-purpose data platforms. That does not make them very sustainable.

To develop sustainable data architectures, generators are required that can generate other data architectures in addition to the more traditional data warehouse architectures, such as data lake and data hub architectures. Such generators can be used to generate data platforms for other forms of data consumption than those supported by data warehouse architectures. If architects want to replace physical data marts developed with SQL databases with virtual data marts implemented with SQL views, the generator should support this. Or, if the central data warehouse needs to be replaced with a more datahub-like solution, or the ETL-solution with a streaming solution, the generator should make this all possible by simply regenerating the platform.

These generators exist. For them the term data warehouse automation is probably a misnomer. Data architecture automation tool is more appropriate.

The whitepaper ‘Sustainable Data Architectures Using Data Warehouse Automation’ describes the need for such automation tools in more detail and explains how WhereScape qualifies as data architecture automation tool: https://www.wherescape.com/resources/whitepaper-sustainable-data-architectures-using-data-warehouse-automation/

?

?

Rémy Fannader

Author of 'Enterprise Architecture Fundamentals', Founder & Owner of Caminao

3 年

For enterprises immersed in competitive digital environments, "Sustainable Data Architectures" can be compared to dry docks. https://caminao.blog/enterprise-architecture-fundamentals-the-book/book-pick-data-information-knowledge/

要查看或添加评论,请登录

Rick van der Lans的更多文章

  • The Multi-Hop Data Architecture Addiction

    The Multi-Hop Data Architecture Addiction

    Explaining Multi-hop and Single-Hop Data Architectures The term multi-hop data architecture is used for architectures…

    1 条评论
  • APIs Constrain Organizations to Exploit Their Data

    APIs Constrain Organizations to Exploit Their Data

    The Sunny Side of APIs It is generally recommended to access data managed by systems, applications, databases, or other…

    5 条评论
  • Integrated Data or Integrateable Data?

    Integrated Data or Integrateable Data?

    In different contexts, I am more and more confronted with an architectural discussion about whether data should be…

    3 条评论
  • Does the Monolithic, Centralized, Multi-Domain Data Platform Make Sense?

    Does the Monolithic, Centralized, Multi-Domain Data Platform Make Sense?

    The data warehouse, the data lake, the data hub, and also the data lake house differ in many ways. But they have a few…

    2 条评论
  • Data Herding Is Not Data Integration!

    Data Herding Is Not Data Integration!

    In recent years, many new terms have been introduced in the IT industry that start with the word data, such as data…

    6 条评论
  • Becoming a Data-driven Organization Requires a Cultural Change

    Becoming a Data-driven Organization Requires a Cultural Change

    Organizations can’t become data-driven simply by purchasing some new data processing tools, moving applications and…

    1 条评论
  • Fivetran, a Data Warehouse Out of the Box

    Fivetran, a Data Warehouse Out of the Box

    Sometimes you overlook products. I had completely missed Fivetran, while they have been around since 2012.

  • Cohelion, an All-in-one Data Warehouse Factory

    Cohelion, an All-in-one Data Warehouse Factory

    Generally, many tools are required to develop a full-blown data warehouse environment, including ETL tools, database…

    1 条评论
  • How Did Algorithm Become a Dirty Word?

    How Did Algorithm Become a Dirty Word?

    More and more often I hear and read about how bad algorithms are. Algorithms are held accountable for profiling people,…

    4 条评论
  • Applications Come and Go, Data Stays

    Applications Come and Go, Data Stays

    While working in the IT industry for over thirty years, I have noticed that quite often there is a silent battle going…

    46 条评论

社区洞察

其他会员也浏览了