Monoliths are bad

Monoliths are bad

Let’s get this out in the open even though it is already - monolithic apps are bad. No surprises so far. We’ve known this for years. The design flaws are myriad and well documented: They’re easy to break, and hard to maintain.? The size of the codebase, and the usually complex internal architecture requires significant engineering skill to run and scale.? The same can be said for data architectures.


"Any line which cannot be justified by its function, does not deserve to be called 'beautiful'". Voisin referred to design in architecture and engineering. Equally, this argument is applicable to digital architecture design.


Domain orientation

Let’s consider the advantages afforded by a microservice architecture such as the domain oriented architecture by Uber in the context of data.

Domain oriented architectures use micro services to abstract over functional domains - think of these as data silos that need to be joined in some way for activation. These domains are separated by functionality, purpose, input and output - ads, CDP, analytics, product analytics, and so on...? They may have some relationships governed by an overarching strategy (make more money!) but predominantly function independently of each other.?

We can learn from domain oriented architectures to address common failure modes.

Typically, ingest, process, serve are all tightly coupled within silos. They depend on each other and if one breaks, they all break.? They’re a large meeting that should have been an email.? By that, they can function asynchronously without needing to show up on time - they consume the email instead.

From “The data dichotomy”:

Decentralise the freedom to act, adapt, and change

Decomposing monolithic apps into microservices may start with the user facing components to nail some quick wins - understandable. Arguments have been offered that favour decentralising the data components first, without decentralised data, the architecture isn’t supportive of microservices.

How do we go about approaching the ideal architecture of data with this in mind - what are the user facing parts??


GA4 is a real world monolith

Let’s put this into a real world scenario.? GA4 data as it lands in BQ is basically a monolith - it’s often mistakenly treated as a single entity from ingest (GTM & GA4), to processing (GA4), and serve (slap Looker on top of BQ). The end result is terribly compromised. At best we try to chip away at the monolith to get to our data. Mostly we just grab the whole lump in one go and attempt to balance it on the head of a pin - “what’s my bounce rate?”.

The source domain of GA4 data in BigQuery is…well, GA4.? It’s designed for collection at an event level. In its raw state it doesn’t map onto domains aligned with consumption of the data - personalisation, audience creation, MMM, experimentation.


Promote change

We need to avoid performing workloads on source oriented domain data. It’s a poor match, and inefficient.? Remember the motivation for choosing micro-services over a monolith? There is a rate of change in the data collection and activation that isn’t supported by a monolith. Where you need to make modifications frequently, the framework will promote, rather than hinder change.??

Guess why a racing car (generally) has one nut per wheel, rather than 4 or more like a road car - they change wheels frequently, and need to do so quickly without compromising safety.? Focus on the parts of the system that you expect to change, and need to easily, quickly, and safely.??


Don’t fetishise complexity

Be mindful of crafting complexity. I violently agree with my friend and industry behemoth, Matt Gershoff , when he warns against fetishising complexity. Honestly, the simplest way of doing something is preferable when the end result is the same.? Don’t write a brand new JavaScript tag for your tag management system when there’s an out-of-the-box template that does the job just as well - or probably better than most shitty JavaScript you’ll experience on a day to day basis.? Aim to have less to maintain, it’s easier to repeat at scale, and you know it’s going to work, even on a wet Wednesday afternoon…


Services for the consuming domains

Consider the needs of the consuming domains.? Cast your mind back half a dozen paragraphs and remember that we “use micro services to abstract over functional domains”.? I’ve already alluded to Google Tag Manager as a form of microservice. It’s more correctly a decoupled system in its own right but what else serves as a service to abstract over domains for our purposes?

Where we have multiple domains - or data sets, what might we use to abstract over the source domain data? The data resides in BigQuery across multiple tables, potentially across platforms, and we expect these tables to change column types, widen, and always grow in volume. How do we handle this in a CD/CI environment that scales and is not monolithic?

If we consider customer domains, and their specific requirements, we can abstract over source data silos to present the required data, and only the required data for the function to happen (Decentralise the freedom to act, adapt, and change).


dbt as a service

It so happens that the amazing team I work for ( Monks ) are advocates of dbt and it also happens that it fits very well into the idea of microservices abstracting over domains.? Quick recap in case you’re still hiding in a cave: dbt is a transformation workflow that lets teams quickly and collaboratively deploy analytics code following software engineering best practices like modularity, portability, CI/CD, and documentation.?

Where services are used to protect consumers against underlying structures and technicalities, the dbt model is the service to provide the data regardless of the underlying complexity - as far as the consumer is concerned.???

Each model generated reflects an aspect of the customer domain business logic. There is no need to write complex boilerplate code to create the different views. The models are deliberately specific in their function - avoiding monolithic models aids scalability, maintenance, and reuse.? This modular approach is key.? We can use Jinja to loop over iterative operations. Write once, iterate execution for efficiency.? This results in simpler modules.

Additionally, the output of each model can be referenced in other models downstream for different customer domains. DBT will generate that relation, or lineage automatically, making it easy to trace back where the different values come from or how they are generated.


Friends don’t let friends query raw data

Thinking about the pace of change that gathers around GA4 - necessarily, the sense of urgency is more acute as the Universal Analytics sunset beacons.

This is not a call to hasty action. This isn’t an advocacy for a great big BigQuery effort to de-silo your data with a bunch of queries to power dashboards. From GA to BQ to Looker, be mindful of the monolithic features that will creep up.

The takeaway from this one-way-discussion is to take a phased approach to building your data architecture.??


  • Discovery first and on going

Understand the data domains, and the requirements of the domain customers.? You know domains will increase in number, and grow in complexity. Expect to repeat this stage regularly.


  • Review and specify

Build dbt models according to specifications that match domain customer requirements.


  • Decouple

Architect services according to specific domain requirements.?

Build simple modules that scale.?

Simpler modules can execute more efficiently.

Stephane Hamel

Strategist in Data Governance, Privacy, Ethics, shaping the future of Digital Marketing & Analytics. Consultant, Educator and Speaker.

1 年

Can’t agree more! Funny you used exactly the same picture of a monolith which I used in a recent presentation as an analogy to Homo sapiens discovering GPT-4 just like primates discovered a monolith in 2001 a Space Odyssey! ??

要查看或添加评论,请登录

社区洞察

其他会员也浏览了