SQLMesh: The future of DataOps
Hi, I’m Toby, cofounder and CTO of Tobiko .?
We are excited to share SQLMesh, an open-source DataOps framework that brings the benefits of DevOps to data teams. It enables data scientists, analysts, and engineers to efficiently run and deploy data transformations written in SQL or Python.?
Our team’s experiences at Airbnb, Apple, Google, and Netflix led us to build SQLMesh, a framework that goes beyond simply transforming data.?
Why do we care?
Once companies realized that “data is the new oil,” they raced to collect and manage tons of it. The rush gave birth to the “Big Data” movement, which led to the creation of fast engines and optimized warehouses that promised to magically make data valuable.?
Despite all the advanced tools, more often than not you’ll find a big oil spill in the middle of it all: duplicated tables, mysteriously interlinked pipelines, and business logic obfuscated in Jinjafied SQL, all held together by duct-taped Airflow DAGs.?
While data teams keep everything running, they are slowed down by fear of change because any tweak could cause the house of cards to come crashing down.
Struggling with change
Here are some challenges that we’ve faced, which I’m sure many data teams will find familiar. These pain points become more acute when data sizes increase or the number of data users expands:
Data pipelines are fragmented and fragile: At Netflix, I remember receiving a Slack message from an exec asking why their dashboard looked off. I was confused because I hadn’t made any recent changes to my jobs. After hours of digging, I finally found an upstream table that had been modified by someone else. They weren’t aware my pipeline depended on the table so didn’t tell me about the change.
Data quality checks are not sufficient: Over the weekend, I studied the upstream code and eventually came up with a fix. However, I didn’t own the pipeline and wasn’t familiar with their deployment process or the possible unintended consequences my “fix” might have on other pipelines. Because the pipeline only had data quality checks, the only way to validate my changes was by backfilling them, which would cost both time and money.
It’s too hard and too costly to build staging environments for data: Additionally, I didn’t have an easy way to deploy a staging environment so I could manually validate and verify the changes. Testing my change against the entire graph of dependencies would have taken massive coordination with many teams, which wasn’t feasible.
Silos transform data lakes to data swamps: Instead of risking the other team’s pipeline, I gave up trying to fix it and made another version of the table with just the changes that I needed. This was much easier for me to get into production because I could do it myself without involving other teams.?
领英推荐
Sound familiar? Software developers had similar struggles before the advent of DevOps. With the DevOps revolution, the entire paradigm of how to build and ship software was reformed.?
As DevOps gained traction, unit tests, continuous integration, and continuous deployment became a standard part of software development.?
It’s time for data teams to have the same confidence as their peers. It’s time for DataOps.
Introducing SQLMesh
We’ve built SQLMesh guided by three core principles:
Correctness is non-negotiable: Bad data is worse than no data. SQLMesh guarantees that your data will be consistent even in heavily collaborative environments.
Change with confidence: SQLMesh summarizes the impact of changes and provides automated guardrails empowering everyone to safely and quickly contribute.??
Efficiency without complexity: SQLMesh automatically optimizes your workloads by reusing tables and minimizing computation saving you time and money.
Here are some key features:
We’re just getting started on our journey to change the way data pipelines are built and deployed. We’re open sourcing as early as possible so that we can grow together with the community. Try out SQLMesh with a simple pip install by following the quick start guide. We’d love to chat and hear about your experiences and ideas in our Slack community.
Data Astronaut ??, exploring the unknown ! ?? ?? ??
4 个月looking forward to try it in real project
Fintech Technical Writing Team Lead | Champion of Clear Communication and Customer Engagement
1 年Good for you!
Chapter Lead - Cloud and Data / Data Architect chez Orange Business
1 年Pierrick Reglioni looks promising
Lead Data Engineer at BBC
1 年Tobias (Toby) Mao this looks really good, I am looking forward to having a play with this to see it action
Director - Games Data at Netflix
1 年Congrats on this launch folks! Can't wait to check it out!