We need to talk about dbt…
Since its emergence out of RJMetrics, dbt has taken the data world by storm. Suddenly, any analyst armed with dbt could build production-grade pipelines using software development best practices. Analytics engineering - a whole new job family - sprung up. It transformed (pun intended) the data industry. However, with every breakthrough innovation come hidden dangers.
DBT Changed Everything
Dbt’s impact cannot be overstated. Dbt mobilized the power of self-service SQL analytics on the Cloud Data Warehouse (CDW). Crucially, dbt took the typically junior yet aspiring SQL analyst and turned her into an impressively-titled analytics engineer. Finally, an alternative career progression to data science or product management!
Today, it is reported that dbt is responsible for about 20% of all CDW compute in the industry, and you only need to spend a day at dbt’s own conference, Coalesce, to see the energy and enthusiasm exuding from dbt practitioners. Dbt turned data warehouses into overpowered transformation engines.
With Great Power Comes Great Responsibility
Thanks to dbt, the barrier to entry for powerful production-grade transformations has plummeted. However, this is a double-edged sword—as more dbt models are produced, the higher the data warehouse bill. Data warehouse spending continues to climb, and while it might have been just dandy in 2020, in a post-ZIPR era, belts are tightening, and all budget items are under the microscope.
Costs of transformation workloads do add up quickly. Data teams consistently tell us that dbt-orchestrated workloads drive the vast majority of their data warehouse spend. Anecdotally, transformations account for 50 to 80% of a typical data warehouse vendor. This data is echoed by George Fraser of Fivetran, whose data suggests that ingest and transformations are the primary data warehouse workloads, not ad-hoc analytics, BI, or serving.
Truth is - dbt drives over-consumption. The typical analytics engineer is incentivized to produce models, not maximize the ROI of analytics. Even the most motivated engineers lack the data and the tooling to identify, investigate, and improve inefficiencies.
领英推荐
Artemis to the Rescue
To ensure dbt is used effectively and provides optimal value, analytics engineers need clear visibility into their warehouse and dbt environments.
Artemis monitors your data stack, investigates the root causes of issues, and resolves them automatically for your review. In other words, we tell you what’s wrong and fix it.
The platform can refactor models, optimize queries, fix broken pipelines and more!
How does it work in real life?
Artemis' customers typically spend between $50k and $1 million a year on their data warehouses. On a weekly basis, an average data team resolves 114 insights and merges 60+ PRs into production in less than four hours using Artemis.
We've helped data teams slash warehouse costs by 20% within the first month of deployment and save over 50 hours monthly on engineering time—time they would have otherwise spent investigating and fixing issues themselves.
That's a mountain of work automated for you!
If dbt is a driver of over-consumption for your data warehouse and you want to fix that, contact Josh Gray and mention the word ‘Tino’ for $1k in cash upon signing up!
Freelance Data Platform Architect | Azure, Fabric & modern data stack | Microsoft Data Platform MVP | dbt Community Award winner | meetup organizer & public speaker
4 个月dBT capitalization in this post is on point ??
Founder at Orchestra
5 个月so cost monitoring tools are the answer to governance and process? Is that the conclusion?
Data-oriented business solutions. Translates documentation into code and reverse docs-to-code.
5 个月What’s the public source for “dbt represents ~20% of CDW workloads”?
Technology Leader | Occasional Philosopher
5 个月pft now talk about the triple cost of paying per row to ingest the data, paying your cdw on ingestion too, then paying dbt per model run, and again paying your cdw for the run time of each model. It’s the ZIRP data stack