In their 2020 paper, “From Ad-Hoc Data Analytics to DataOps,”
Aiswarya Raj Munappy
,
David Issa Mattos
, @Jan Bosch,
Helena Holmstr?m Olsson
, and
Anas Dakkak
ak define DataOps, explore its core elements, and introduce a five-phase maturity model. The paper is a result of a collaboration between researchers from
Chalmers University of Technology
,
Malm? universitet
, and
爱立信
, who combined insights from academic literature with expert interviews to ground their conceptual work in real-world experience.
One outcome is a clear understanding of why organizations invest in DataOps. They aim to “achieve more insights/value cheaper and faster while still keeping the quality.” Researchers and practitioners tend to approach the topic of DataOps from one or more of the following four perspectives:
- Activities of DataOps, i.e., what do the engineers do?
- Goals organizations aim for with DataOps.
- Technologies for implementation
- Organizational structures and working methods in this domain
Understanding these facets provides valuable context for navigating the many projects, programs, and sales pitches prevalent in today’s organizations.
The paper also compares DevOps and DataOps, noting that both emphasize agility and collaboration, though there are distinct differences. DevOps integrates development and operations, whereas DataOps combines value pipelines—such as data warehouses or AI systems that quickly analyze camera data from an assembly line to detect irregularities—with innovation pipelines that deliver new analytics ideas.
Based on interviews with Ericsson specialists, the authors outline five DataOps phases, which one would typically name maturity levels:
- Ad-Hoc Data Analytics: At this initial level, engineers perform on-demand queries to answer specific business questions quickly. Customers may select data sources and fields themselves. Reuse of reports and queries is rare.
- Semi-Automated Data Analysis: Here, the concept of data pipelines streamlines data collection, ingestion, preparation, and visualization.
- Agile Data Science: The focus shifts to delivering continuous business value through frequent updates. Sprints, a central code repository, and sprints are the core concepts of this level.
- Continuous Testing and Monitoring: This phase emphasizes the importance of ongoing testing and monitoring to ensure the robustness of data pipelines. Automated unit and high-level tests, along with monitoring and automatic alerting, are vital to maintaining the reliability and stability of data pipelines.
- Full DataOps: The final level reads like a collection of Christmas wishes of a complete village. The abstract goal of managing data and code together to shorten delivery times translates directly into two actions, one technological and one organizational, quite inspiring. Organizationally, they suggest uniting all data-related specialists into groups aligned with the company’s value stream. Technologically, the focus is on transitioning from data pipelines to data products, which requires orchestrating the delivery of insights and technical changes, which benefits from fostering collaboration among teams. Instead of a large set of separate reports, the result is an ecosystem of interconnected data products whose dependencies form an acyclic graph—a powerful way to align DataOps with the concept of data products!