Data should be Immutable in any Modern Architecture
David Strickland
Software Engineering Leader Specializing in Legacy Codebase Transformation & Team Revitalization
In Data Mesh, one of the most transformative concepts to me was when it presented us the notion that data is inherently immutable. This principle, akin to the realization of recursion or the power of code generation, fundamentally alters one's perspective as a developer once fully comprehended. At first glance, this concept may appear counterintuitive, as the terminology of "changing data" is ubiquitous in the field. However, in practice, the underlying data remains unaltered; instead, what is modified are the aggregates that represent abstractions derived from the original data.
The Concept of Immutability
Consider a seemingly straightforward example, such as an address. On average, individuals relocate approximately every seven years, leading us to perceive this as a change in address. However, the historical fact that a client resided at Address A from Date 1 to Date 2, and subsequently relocated to Address B, remains incontrovertibly true. This historical data persists irrespective of changes. While a business might primarily focus on the current address, this focus represents an abstraction defined by specific business rules, rather than an alteration of the original data itself. The underlying truth—the chronological sequence of addresses—remains unchanged. Even if the historical information is not recorded and only the current address is retained, the fundamental reality does not change. In this respect, data is immutable.
Developer Decisions and Data Preservation
As a developer, I often endeavored to convey to Product Owners and Managers the notion that I was making decisions on a continuous basis that significantly impacted their business. Among these decisions was the determination of which data to preserve, which to aggregate, and which to discard. Consider a scenario involving a user interaction—such as clicking a button. Do we retain all contextual information surrounding that action, or do we simply execute the action itself, such as deleting an account and purging all associated records from the database?
User Story Example
User Story: As a user, I want to delete my account so that my data is removed from the system.
When a user story states, "As a user, I want to delete my account," should we track every preceding action that led to the deletion, or simply execute a cascading delete operation based on the user ID? Regardless of whether we opt to record these events, the underlying data these events represent—the fact of the button click, the account deletion, the temporal sequence of actions—remains immutable. Even if we choose not to preserve every detail, the underlying events remain unchanged.
领英推荐
Event Sourcing and Data Integrity
The concept of Event Sourcing has gained increasing prominence, as it brings us closer to the goal of preserving all data in its original, unaltered form. Event Sourcing involves capturing state changes as a sequence of immutable events, which can then be replayed to reconstruct the current state of a system. Rather than merely updating an address, we now record that Person X declared their new address as Y at Time Z—an event that is retained indefinitely. To determine an individual's current address, one need only identify the most recent "New Address" event.
The primary challenge associated with Event Sourcing, however, is the coexistence of legacy and contemporary data paradigms. Event Sourcing is inherently temporal, representing a series of events in chronological order. When systems are initially constructed to store aggregates, rather than the complete sequence of events, transitioning to an event-sourced architecture becomes highly complex. The legacy aggregate data must be treated as the definitive source of truth unless superseded by more recent events, thereby creating a dual state that complicates the adoption process.
The Costs and Benefits of Immutability
While continuing to rely on aggregate data may seem more straightforward, the costs associated with this approach are becoming increasingly prohibitive. With the advent of machine learning and the data mesh paradigm, the intrinsic value of raw, historical data is becoming more apparent. A compelling argument now exists for maintaining legacy aggregates while simultaneously capturing every subsequent event. Event Sourcing involves managing an enormous volume of data—vast quantities that necessitate substantial computational power to query, process, and store. However, we are entering an era in which the costs of computation and storage are diminishing, thereby mitigating these barriers.
Simultaneously, advances in machine learning are providing the means to leverage this comprehensive dataset to develop sophisticated predictive models. As data becomes increasingly recognized as a product, we are better positioned to store and analyze it in its entirety. Within the next five to ten years, technological advancements will render the costs of storage and processing largely inconsequential. Yet, given that data is immutable, failing to capture it today results in a permanent loss.
Conclusion
The significance of understanding data as immutable cannot be overstated. It compels developers and organizations to fundamentally reconsider data architecture, Event Sourcing, and the enduring value of information. By adopting this perspective, we position ourselves to fully harness the potential of a data-driven future.