Data Build Tool: A Modern Tool for Analytics Engineering (by Bing Chat)
.Rogier Werschkull
Head of Data @ Coolgradient | Data-analytics trainer | Rockstar & AI artist @ night ;) | Love calling bullshit
Now that #chatgpt4 integrated into #Bing is here, I though to give this version (https://www.dhirubhai.net/posts/rogierwerschkull_chatgpt-dbt-dbt-activity-7008030139523416064-xuV8) of an earlier post of mine a new spin by using the following prompt:
?'Write a professional blog post of standard length, explaining data build tools, a 'modern data stack' tool used for analytics engineering. Use the following paragraphs:?
-What is data build tools??
-What data build tools is NOT??
-Why data build tools is not a data modeling tool and the name 'model' for a .sql file is ill-chosen?
-Intended audience?
-5 most important pros and web links to more details?
-5 most important cons and web links to more details?
-How to get started using data build tools?
-A list of the most important people on substack that write about data build tools?
-A list of the most important web sources to learn more about using data build tools.'
Below is the improved result, with only minor changes, one being that Bing Chat was adding some links related to a COMPLETELY different type of dbt, being: https://en.wikipedia.org/wiki/Dialectical_behavior_therapy
This clearly shows that this kind of advanced 'auto-complete' is NOT really 'intelligent' as a human would NEVER do this...
Also, it seems to have 'missed' my question on discerning between folks on substack that mention dbt and the other web resources...
Data Build Tool: A Modern Tool for Analytics Engineering
If you are a data analyst or a data engineer, you might have heard of dbt, or data build tool. But what is it exactly and how can it help you transform your data more effectively? In this blog post, I will explain what dbt is, what it is not, why it is not a data modeling tool, who should use it, and what are its pros and cons. I will also show you how to get started with dbt and where to find more resources and community support.
What is dbt?
dbt stands for data build tool.?It is an open-source command-line tool that helps data teams work like software engineers—to ship trusted data, faster.?dbt enables you to write simple SQL queries that reflect your core business logic and transform your raw data into clean and consistent datasets for reporting, ML modeling, and operational workflows.?dbt also provides you with features such as version control, testing, documentation, deployment, and monitoring that are typically found in software engineering workflows.
dbt works directly within your data warehouse or lakehouse, so you don’t need to move your data around or use external tools.?You can use dbt with any SQL-based data platform, such as BigQuery, Snowflake, Redshift, Databricks, etc.?dbt also supports Python-based transformations using Spark or pandas.
What dbt is NOT?
dbt is not an ETL (Extract-Transform-Load) tool. It does not extract or load data from/to different sources or destinations.?You still need to use other tools or services to ingest your raw data into your data warehouse or lakehouse.
dbt is also not a data modeling tool. It does not create schemas, tables, views, indexes, or other database objects.?It only transforms the data that already exists in your warehouse or lakehouse using SQL queries.
领英推荐
Why dbt is not a data modeling tool?
One of the confusing aspects of dbt is that it uses the term “model” to refer to a .sql file that contains a SQL query that transforms your data. However, this does not mean that dbt is creating a database model or schema.?Rather, it means that dbt is creating a logical model of your business domain using SQL as the language.
A logical model is an abstract representation of your business concepts and rules that can be implemented in different ways depending on the physical structure of your database. For example, you can have a logical model of customers that defines their attributes and relationships with other entities such as orders and products. You can then implement this logical model in different ways using different database objects such as tables, views, indexes, etc.
dbt does not care about how you implement your logical model in your database. It only cares about how you transform your raw data into clean and consistent datasets that match your logical model using SQL queries.?This allows you to focus on writing business logic rather than writing boilerplate code for creating database objects.
However, this also means that dbt does not help you design or optimize your physical database schema. You still need to use other tools or best practices to create and manage your database objects such as tables and views.?You also need to consider factors such as performance, scalability, security, and cost when choosing how to materialize your datasets using dbt.
Who should use dbt?
dbt is designed for analytics engineers: people who combine analytical skills with engineering skills to build reliable and scalable data pipelines for analytics purposes. Analytics engineers are typically data analysts who want to work more like software engineers or software engineers who want to work more with data.
If you are an analytics engineer who wants to:
Then dbt might be the right tool for you.
What are the pros of dbt?
Here are some of the benefits of using dbt:
What are the cons of dbt?
Here are some of the drawbacks of using dbt:
How to get started with dbt?
If you want to try dbt, you can create a free account on dbt Cloud, which is a web-based UI that provides a hosted environment for developing, testing, scheduling, and investigating data models.?You can also use dbt Core2, which is the open-source command-line tool that you can install and run locally on your machine.
To learn how to use dbt, you can follow the tutorials on the dbt documentation site2, which will guide you through the basics of setting up a project, writing SQL queries, testing and documenting your models, deploying and running your models, and more. You can also check out the examples on the dbt GitHub repository, which will show you how to use dbt with different data platforms and scenarios.
Where to find more resources and community support?
If you want to learn more about dbt or get help from other users, you can join the dbt Community1, which is a 50,000 strong and growing network of analytics engineers who share their knowledge and experience on various topics related to dbt.?You can also attend one of the dbt Meetups, which are local events where you can meet analytics engineering practitioners in your area.
Another way to stay updated with the latest news and trends on dbt is to subscribe to some of the newsletters or blogs that cover dbt. Here are some of the most popular ones:
Conclusion
dbt is a powerful tool for analytics engineering that helps data teams work like software engineers—to ship trusted data, faster. It enables you to write simple SQL queries that reflect your core business logic and transform your raw data into clean and consistent datasets for reporting, ML modeling, and operational workflows. It also provides you with features such as version control, testing, documentation, deployment, and monitoring that are typically found in software engineering workflows.
However, dbt is not an ETL tool or a data modeling tool. It does not handle data extraction or loading or help you design or optimize your physical database schema. You still need to use other tools or services for those tasks. You also need to consider factors such as performance, scalability, security, and cost when choosing how to materialize your datasets using dbt.
Data Leader | Enterprise Cloud Data / Platform Architecture
1 年For an introductory / 101 blog, it's actually not bad at all.
Data Products Engineer @ Allergan Aesthetics Tech | M.S. Software Architecture
1 年> -Why data build tools is not a data modeling tool and the name 'model' for a .sql file is ill-chosen? ??