The Great Shift Left: Embracing the Shift Left Data Architecture
Dunith Danushka
Product Marketing at EDB | LinkedIn Top Voice | Writer | Data Educator
“Shift Left” in data architecture refers to moving data quality controls, testing, and validation earlier in the data pipeline, rather than waiting until data reaches its final destination. Think of it as pushing quality gates "leftward" in your data flow.
This article explores the journey of ACME Corp, a fictional company, that transformed its data practices by implementing a Shift Left strategy. From establishing data contracts to embracing real-time processing, this case study illustrates how moving data quality checks and processing closer to the source can dramatically improve an organization's data operations and decision-making capabilities.
Whether you're a data professional looking to optimize your company's data architecture or a business leader seeking to understand the latest trends in data management, this article walks you through the potential of the Shift Left approach and its real-world applications.
The ACME Corp - A House of Cards
Once upon a time, ACME Corp, a global leader in consumer electronics, was struggling with a common issue that plagued many organizations of its scale: focusing on speed of delivery and treating data quality as an afterthought.
Their architecture looked like this:
Their traditional data architecture had served them well in the past. However, as the volume and velocity of data surged, cracks began to show, accumulating problems including:
The Breaking Point
It was a typical Monday morning when Sarah Chen, ACME Corp's Chief Data Officer, walked into an emergency executive meeting. The company's flagship AI-powered inventory management system had just made a $12 million mistake. The system had ordered a year's worth of raw materials for a product line that was being discontinued next month.
"How did this happen?" demanded the CEO.
The root cause analysis revealed a familiar pattern: bad data had silently crept through their pipelines, propagated across systems, and finally surfaced when it was too late - and too expensive - to fix.
The Decision to “Shift Left”
The inventory system fiasco was the final straw. Sarah and her team spent weeks researching modern data architecture patterns and discovered the "shift left" approach.
What is Shift Left data architecture? Another fancy concept only seen at data conferences? Well, it’s too early to say that. But we can safely say that it is a proactive approach designed to address potential issues earlier in the data lifecycle and enable real-time, streaming data. By shifting responsibilities like data quality checks, processing, and monitoring closer to the data generation point, organizations like ACME could theoretically empower each department to access fresher insights without the typical wait time associated with batch processing.
Moreover, ACME has seen a few case studies showing that by shifting left, data management responsibilities could move closer to developers and analysts. This would allow for faster troubleshooting, reduce rework, and—most importantly—decrease dependency on a single, overburdened data engineering team.
The Great Shift Left - Implementing the Shift Left Architecture
Transitioning to a shift left architecture was no small feat, and Sarah’s team knew it meant a reimagining of their traditional setup.
Rather than making a big bang transition, they followed a phased approach towards adoption.
Phase 1: Establishing Data Contracts
Data contracts are formal agreements between data producers and consumers that define the structure, format, and quality expectations of shared data. Think of them as mutual commitments that data producers and consumers uphold, ensuring data consistency and reliability throughout the system.
Firstly, ACME decided to establish data contracts between teams. Every data producer and consumer had to agree on:
To implement data contracts effectively, ACME could leverage several open-source tools:
The nice thing about these tools is that most of them are declarative, allowing version-controlled and GitOps-based workflows for maintenance.
Phase 2: Quality at source
With the data contracts in place, ACME implemented validation gates at every data entry point.
Rather than waiting until data reached the central warehouse for quality assessments, ACME embedded data validation at the source. Data producers were given responsibility for ensuring quality using lightweight validation scripts integrated into the data streams.
The data contracts they established in the previous phase served as the basis for validations. They explicitly state the expected format, structure, and content of the data. This allows validation scripts to check incoming data against these predefined criteria.
Next, the team committed the validations to a central Git repository and integrated them with their CI/CD pipeline. This approach offered several advantages:
Phase 3: Data Observability at every stage
Comprehensive data observability gives real-time insights into data health, usage patterns, clear custodianship during transformations, and potential anomalies, allowing for proactive problem-solving and continuous improvement of data pipelines. Data lineage provides a clear understanding of how data moves and transforms throughout the system, enabling quick identification of error sources and impact analysis. Defined custodianship ensures accountability at each stage of data processing, reducing the risk of data quality issues slipping through unnoticed.
Next, ACME integrated observability tools to track data lineage, quality, and schema changes. This allowed engineering teams to catch and address issues on the fly, with alerts triggered in real-time.
Phase 4: Self-service analytics, data access, and federated governance:
Taking things further, the team established self-service data marts and implemented a data catalog, enabling analysts and business teams to easily locate and consume data without waiting on engineering. This was a monumental step, shifting data ownership to teams who could now directly control their data streams. Additionally, this promoted a federated data governance model within ACME departments, empowering each department with governance controls, and ensuring compliance and data stewardship without the usual bottlenecks.
Had ACME Corp invested in a self-service analytics and data access system, they might have averted the $12 million blunder we mentioned earlier. Let's explore how:
Leveraging edge processing with streaming ETL pipelines
Sarah’s team used Apache Kafka and Apache Flink in the shift left architecture for real-time data collection and processing. Kafka decoupled the data producers and consumers, ensuring that data is captured and made available in real time without overwhelming downstream systems.
Apache Flink, a stream processing framework, complemented Kafka by providing powerful, low-latency data processing capabilities. Flink's ability to handle both batch and stream processing within a single engine aligns perfectly with the shift left philosophy. It would enable ACME to perform complex event processing, data transformations, and analytics directly on the streaming data as it flows through the system, rather than waiting for data to be stored in a data lake or warehouse before processing.
The combination of Kafka and Flink enabled ACME to implement data quality checks, apply data contracts, and perform real-time analytics at the edge of their data architecture. This drastically reduced the time to insight, allowed for immediate detection and correction of data issues, and provided a flexible foundation for building and deploying data products.
Shift Left architecture augmented with streaming data looks like this:
A New Era at ACME Corp
Like a phoenix rising from the ashes of outdated data practices, ACME Corp emerged transformed. After navigating through a labyrinth of iterations and overcoming numerous cultural challenges, the company finally embraced the shift left architecture in its entirety. The results of this transformation were nothing short of revolutionary:
Today, ACME Corp operates with data agility that their competitors envy. By transforming data operations, ACME discovered a powerful truth: sometimes, the key to scaling data-driven success is not to wait for insights, but to bring the insights to the teams who need them most. And with that, they’re ready for whatever the future brings.
How to convince your CXO to embrace Shift Left?
Well, I believe you got the idea and value proposition behind Shift Left after reading the inspiring story of ACME. While it’s just fiction, organizations would have to fight a little hard implementing Shift Left architectures in real life. Here’s my reasoning:
As Sarah Chen often says:
"In the world of data, quality is not a destination - it's a journey. And that journey begins as far left as you can possibly go.”
10k+| Member of Global Remote Team| Building Tech & Product Team| AWS Cloud (Certified Architect)| DevSecOps| Kubernetes (CKA)| Terraform ( Certified)| Jenkins| Python| GO| Linux| Cloud Security| Docker| Azure| Ansible
4 个月Interesting
DataOps |Data Architect | Data Governance Expert | Big Data, ETL, Metadata, Cloud Integration
4 个月A Good Point , true as left as possible , Data generators with CI/CD and Data Quality on mind , and Data Contracts for every data source , leads to clean and more productive work for the whole company Data engineers, Data Science team and Data Analyst, and end users