Data Architecture-as-a-Service: Liberation for Data Users
ABSTRACT:?Data architecture-as-a-service or DaaS is a new self-service paradigm that empowers local data owners to create architecturally compliant data repositories.
We are soon to make a giant leap forward with self-service data and analytics. We’ve developed self-service tools for reporting, analysis, dashboarding, data set preparation, and even data science (e.g., autoML). But what we haven’t delivered yet are no-code tools that enable business users to create their own data repositories without IT assistance.?
Versus data silos.?Of course, we’ve always had data silos. Business users create them all the time in Excel, easy-to-use databases like Microsoft Access or SQL Server, or data preparation tools, like Alteryx or Tableau. What we need are “modern data silos” that enforce architectural integrity and data consistency using common dimensions, definitions, and logic. These self-service structures provide the speed and agility of data silos without the harmful consequences. These “non-siloed, data silos” are the essence of what I call “data architecture-as-a-service” or DaaS.?
These self-service structures provide the speed and agility of data silos without the harmful consequences.
Business-built data domains.?Data architecture-as-a-service enables business users to build local data domains or repositories without undermining enterprise data consistency and trustworthiness. It is the culmination of self-service, where business units liberate themselves almost entirely from enterprise IT. If done right, DaaS reduces data bottlenecks, eases the burden on enterprise data teams, and empowers local domains to service their own data needs. It’s also a key ingredient in the data mesh, an emerging distributed architecture for data ownership and management.?
Approaches
How is this possible??But here's the challenge: it’s obvious that we can’t expect data analysts to do the work of data architects or data engineers. They don’t know how to design, model, and implement robust, scalable data environments or build data pipelines that reuse standard data flows and naming conventions. We’ve seen what happens when they try: they create brittle, high-risk data silos and pipelines that don’t scale or perform well. But with DaaS, we bake architectural requirements into self-service data engineering tools so business users can create their own repositories without undermining data consistency and trustworthiness.
With architecture-as-a-service, we bake architectural requirements into self-service data engineering tools so business users can create their own repositories without undermining data consistency and trustworthiness.
Software building blocks.?In our consulting practice, we’ve seen enterprise data architects create data “building blocks” that departmental analysts use to create extensions to an enterprise data warehouse. The blocks contain governance guardrails that enable analysts to create their own data marts without deep knowledge of SQL, data structures, query logic, or schemas.?
领英推荐
Unfortunately, it’s a heavy lift for most enterprise data teams to create a self-service data infrastructure given competing demands for their time. Fortunately, some vendors have recognized an opportunity and now offer data architecture-as-a-service tools. These products come in a variety of shapes and forms.?
DaaS Products
Extensible data models.?For instance, cloud data analytics vendors, such as?Domo ?and?Infor Birst , provide multi-tenant data environments with extensible data models. This enables primary tenants to propagate a global model to sub-tenants who can extend that model by adding new columns and tables to support local requirements. The global model rolls down to sub-tenants, while local data and model extensions stay local. This hub-and-spoke approach is ideal for supporting retail and manufacturing distribution networks but can be applied in almost any data environment.
Self-service data engineering.?More recently, data engineering vendors, such as?Coalesce ?and?Fivetran ?offer multi-code, template-driven toolkits that make it easy for data analysts or domain data owners to create repositories that align enterprise governance and schema requirements. Most of these tools are cloud-based variants of data integration, data transformation, or data warehouse automation tools.?
For example, Coalesce, which launched last month, is a data transformation vendor that offers a more modern version of?dbt , a popular, open source data transformation toolkit. Founded by ex-Wherescape employees, Coalesce offers both GUI- and code-based development environments, a column-aware architecture that supports full data lineage, and built-in automation functions. However, what I like best about this new product is that it allows data architects to build architectural guardrails into the GUI-based development environment via templates and other techniques so that business analysts can build architecturally compliant data repositories and pipelines.?
Similarly, Fivetran is a data integration vendor that offers a more automated approach to centralizing cloud application data. This makes it possible for a data analyst, rather than a data engineer, to build architecturally compliant data pipelines that move data from a single cloud application into a target database and run pre-built transformation processes to harmonize that data into a common schema. Both Coalesce and Fivetran are harbingers of a booming market for DaaS tools.?
In transition.?Today, however, a highly motivated data analyst might be able to use any number of GUI-based data engineering tools to build a data pipeline. However, there is little chance they will produce something that complies with architectural guidelines or governance standards. You need a trained data engineer whose work is reviewed by an enterprise data architect to do that. In a Data Architecture as a Service paradigm, however, a data architect configures a DaaS-ready tool to adhere to enterprise data standards and structures so a data analyst, rather than a data engineer, builds compliant data pipelines.?
When we abstract data architecture, we solve the most enduring data pain point: the proliferation of data silos that wreak havoc on data consistency and trustworthiness.?
Conclusion.?Data architecture-as-a-service is a verbal twist on cloud processing environments, such as software-as-a-service or platform-as-a-service. This moniker conveys that it’s possible to abstract architecture and build it into easy-to-use, customer-facing tools. When we abstract data architecture, we solve the most enduring data pain point: the proliferation of data silos that wreak havoc on data consistency and trustworthiness.
Sr Enterprise Data Governance Manager
1 年Great article Wayne! It especially spoke to me as we implemented a similar approach a few years back (we called it something else...I like DaaS a lot better!). We built it around a no / low code data query tool and it has proven to be the key for unlocking our data self-service approach (for those not on the "silver service plan"). We are continuing to build on it by adding in low / no code prep tools allowing users to build out their own data mart / data lakehouse / etc. At the same time, allowing these to be refreshed on a schedule (user defined ETL of sorts). Now rather than a having large Enterprise models, we have smaller ones built around the user's specific use case while allowing them to curate it within the enterprise data governance parameters. Users now have much more control over their data and how its consumed.
Chief Revenue Officer | Chief Operations Officer | Chief Strategy Officer | Enterprise Software Sales & Marketing
2 年Great article Wayne! Far too often I have seen analytics project stopped in their tracks while the company builds out a data repository that will be their "single-source-of-truth." The problem though is that this approach wont scale, and these projects never end. A flexible data model that allows the analysis of new data is critical and it needs to be in the original design. We don't know what we don't know when we begin projects so why would we assume we know what data we will need in the future? The analytical tools are important, but data flexibility is the key.
IT Solutions/Systems Expert - Researcher
2 年Hi Wayne, in my perspective this is the key sentence from your post. ?What we need are “modern data silos” that enforce architectural integrity and data consistency using common dimensions, definitions, and logic?. That boils down to efficient and effective data modeling, transformation and integration of existing data resources according to new data modeling standards and automated discovery and query of data in an associative manner (see my work).
The data model must come down in price and expertise. The current assumption that more complex data require more elaborate and costly solutions is UNSUSTAINABLE. Also, analysts need to play the data, and need for that a direct & simple architecture.