Setting Up Your Local Machine for dbt Core: A Comprehensive Guide
Alex Migit ??
Accredited [I]nvestor | 10X Mindset | Athlete | Sr. Data Engineer | Serial Entrepreneur | Musician | Creator | Impact Driven | You Are Limitless
dbt? is an awesome SQL-first transformation workflow that lets teams quickly and collaboratively deploy analytics code following software engineering best practices like modularity, portability, CI/CD, and documentation.
With dbt, anyone on a data team can safely contribute to production-grade data pipelines seamlessly.
Getting Started: Setting Up for Success
This article is primarily for Windows operating systems, but will likely prove as a useful reference for those using various other OS software.
I've also added helpful links throughout this article, but make sure you are holding down the CTRL key before clicking, so that the link opens up in a new tab. At the time of this writing, it does not look like authors are able to add the HTML "_blank" target attribute to links.
Both Python and Git for Windows will need installed before you get started.
If you do not have Python installed on your local workstation, you can download and install in from the python.org website downloads page here.
Check that Python is installed and/or the release version with the following command in your command prompt or terminal:
$ python --version
If you don't already have Git SCM installed, now would be a good time to download and setup. Check out the Git online Reference Manual for The official and comprehensive man pages that are included in the Git package itself.
While you're at it, a good source-code editor like VS Code would probably be good to have installed as well.
Streamlining Your Workflow: Mastering Package Installation for Efficiency
First, you'll need to install dbt Core on your Windows system. You can do this using pip, the Python package manager.
pip is a package-management system written in Python, and it is the preferred installer program. Starting with Python 3.4, pip is included by default with the Python binary installers.
Install dbt Core by running the following in your command prompt or terminal:
$ python -m pip install \
dbt-core \
dbt-postgres \
dbt-redshift \
dbt-snowflake \
dbt-bigquery \
dbt-trino
Install only the above package(s) you need with their respective dbt-<adapter>, OR install them all.
If prompted to upgrade pip, execute the following command:
$ python.exe -m pip install --upgrade pip
I recommend verifying and keeping track of package versions, so that you are aware of what you're working with. Keeping track of software versions can prove to be extremely helpful when troubleshooting and working with various packages.
To upgrade dbt-core to the latest version via the terminal, you can use pip, the Python package manager.
Check the current version of dbt Core by running:
$ dbt --version
Core:
- installed: 1.4.8
- latest: 1.7.6 - Update available!
Your version of dbt-core is out of date!
You can find instructions for upgrading here:
https://docs.getdbt.com/docs/installation
Plugins:
- sqlserver: 1.4.3 - Up to date!
All adapters build on top of dbt-core. Some also depend on other adapters: for example, dbt-redshift builds on top of dbt-postgres. In that case, you would see those adapters included by your specific installation, too.
To upgrade dbt Core to the latest version, use pip:
$ pip install --upgrade dbt-core
After the installation completes, you can verify that dbt-core has been upgraded to the latest version by running the dbt --version command.
Note that you may also have to upgrade the dbt-<adapter> packages that you have installed, as you might run into dependency conflicts like in the example below with the SQL Server package.
$ pip install --upgrade dbt-core
...
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
dbt-sqlserver 1.4.3 requires dbt-core~=1.4.5, but you have dbt-core 1.7.6 which is incompatible.
...
Navigate to your user directory (e.g. ~/User/YOUR-USERNAME/), and create a ~/.dbt/ folder. This is where you will create and store your profiles.yml file.
$ mkdir .dbt
Connection profiles
When developing locally, dbt connects to your Data Warehouse (DW) using a profile, which is a YAML file with all of the connection details to your DW.
To use dbt Core, you'll need a profiles.yml file that contains the connection details for your data platform.
In your profiles.yml file, you can store as many profiles as you need. Typically, you would have one profile for each warehouse you use. Most organizations only have one profile.
Note that dbt uses YAML in a few different places. If you're new to YAML, it would be worth learning how arrays, dictionaries, and strings are represented.
This file generally lives outside of your dbt project to avoid sensitive credentials being checked in to version control, but profiles.yml can be safely checked in when using environment variables to load sensitive credentials.
The YAML file below shows the configuration options in the profiles.yml file:
config:
send_anonymous_usage_stats: <true | false>
use_colors: <true | false>
partial_parse: <true | false>
printer_width: <integer>
write_json: <true | false>
warn_error: <true | false>
warn_error_options: <include: all | include: [<error-name>] | include: all, exclude: [<error-name>]>
log_format: <text | json | default>
debug: <true | false>
version_check: <true | false>
fail_fast: <true | false>
use_experimental_parser: <true | false>
static_parser: <true | false>
cache_selected_only: <true | false>
printer_width: <integer>
log_format: <text | json | default>
<profile-name>:
target: <target-name> # this is the default target
outputs:
<target-name>:
type: <bigquery | postgres | redshift | snowflake | other>
schema: <schema_identifier>
threads: <natural_number>
### database-specific connection details
...
<target-name>: # additional targets
...
<profile-name>: # additional profiles
...
You may want to have your profiles.yml file stored in a different directory than ~/.dbt/ – for example, if you are using environment variables to load your credentials, you might choose to include this file in the root directory of your dbt project.
Note that the file always needs to be called profiles.yml, regardless of which directory it is in.
Unlocking Project Configurations: Optimizing Your Workflow with dbt
Every dbt project needs a dbt_project.yml file — this is how dbt knows a directory is a dbt project. It also contains important information that tells dbt how to operate your project.
By default, dbt will look for dbt_project.yml in your current working directory and its parents, but you can set a different directory using the --project-dir flag or the DBT_PROJECT_DIR environment variable.
The following example is a list of all available configurations in the dbt_project.yml file:
领英推荐
# Name your project. Project names should contain only lowercase
# characters and underscores. A good package name should reflect
# your organization's name or the intended use of these models
name: string
version: '1.3.0'
require-dbt-version: '>=1.3.0'
config-version: 2
# This setting configures which "profile" dbt uses for this project.
profile: profilename
# These configurations specify where dbt should look for
# different types of files.
# The `model-paths` config, for example, states that models in
# this project can be found in the "models/" directory.
# You probably won't need to change these.
model-paths: [directorypath]
seed-paths: [directorypath]
test-paths: [directorypath]
analysis-paths: [directorypath]
macro-paths: [directorypath]
snapshot-paths: [directorypath]
docs-paths: [directorypath]
asset-paths: [directorypath]
target-path: directorypath
log-path: directorypath
packages-install-path: directorypath
clean-targets: [directorypath]
query-comment: string
require-dbt-version: version-range | [version-range]
dbt-cloud:
project-id: project_id # Required
defer-env-id: environment_id # Optional
quoting:
database: true | false
schema: true | false
identifier: true | false
models:
<model-configs>
seeds:
<seed-configs>
snapshots:
<snapshot-configs>
sources:
<source-configs>
tests:
<test-configs>
vars:
<variables>
on-run-start: sql-statement | [sql-statement]
on-run-end: sql-statement | [sql-statement]
dispatch:
- macro_namespace: packagename
search_order: [packagename]
restrict-access: true | false
When you run dbt Core from the command line, it reads your dbt_project.yml file to find the profile name, and then looks for a profile with the same name in your profiles.yml file. This profile contains all the information dbt needs to connect to your data platform.
Later, you will see how you can utilize the dbt init command to easily generate both profiles.yml and dbt_project.yml files.
Creating The Repository
Create a repository in your github.com account. Alternatively, you can use the GitHub Desktop App if you prefer a GUI, want to simplify your development workflow, and/or find yourself fighting with Git on a regular basis.
Navigate to your Documents folder [or your preferred working directory for cloning repositories], and create a GitHub folder.
You can also create your working directory with the following command:
$ mkdir GitHub
Clone the repository you created earlier to your GitHub working directory.
To clone your repository, locally, using the command line using HTTPS, copy the URL displayed in GitHub Web with the HTTPS button selected.
In your CLI, type git clone, and then paste the URL you copied earlier. Your command will look similar to the following example:
$ git clone https://github.com/YOUR-USERNAME/YOUR-REPOSITORY
Now that you have cloned your GitHub repository to your local machine, you are ready to start structuring your project.
Crafting Successful dbt Projects: Best Practices and Strategies
Analytics engineering, at its core, is about helping groups of human beings collaborate on better decisions at scale.
Humans generally have limited bandwidth for making decisions.
Humans also, as a cooperative social species, rely on systems and patterns to optimize collaboration with others.
This combination of traits means that for collaborative projects it's crucial to establish consistent and comprehensible norms such that a team’s limited bandwidth for decision making can be spent on unique and difficult problems, not deciding where folders should go or how to name files.
Building a great dbt project is an inherently collaborative endeavor, bringing together domain knowledge from every department to map the goals and narratives of the entire company.
As such, it is especially important to establish a deep and broad set of patterns to ensure as many people as possible are empowered to leverage their particular expertise in a positive way, and to ensure that the project remains approachable and maintainable as your organization scales.
Steve Jobs and the Analogous Brilliance of dbt
Famously, Steve Jobs wore the same outfit everyday to reduce decision fatigue. You can think of this overview similarly, as a black turtleneck and New Balance sneakers for your company’s dbt project.
A dbt project’s power outfit, or more accurately its structure, is composed not of material fabric: but of files, folders, naming conventions, and programming patterns.
How you label things, group them, split them up, or bring them together — the system you use to organize the data transformations encoded in your dbt project — this is your project’s structure.
Structuring dbt projects
Structuring the files, folders, and models for our three primary layers in the models directory, which build on each other:
Below is a complete file tree of the jaffle_shop sample project from dbt Docs, and a nice example of the default monolithic dbt structure:
jaffle_shop
├── README.md
├── analyses
├── seeds
│ └── employees.csv
├── dbt_project.yml
├── macros
│ └── cents_to_dollars.sql
├── models
│ ├── intermediate
│ │ └── finance
│ │ ├── _int_finance__models.yml
│ │ └── int_payments_pivoted_to_orders.sql
│ ├── marts
│ │ ├── finance
│ │ │ ├── _finance__models.yml
│ │ │ ├── orders.sql
│ │ │ └── payments.sql
│ │ └── marketing
│ │ ├── _marketing__models.yml
│ │ └── customers.sql
│ ├── staging
│ │ ├── jaffle_shop
│ │ │ ├── _jaffle_shop__docs.md
│ │ │ ├── _jaffle_shop__models.yml
│ │ │ ├── _jaffle_shop__sources.yml
│ │ │ ├── base
│ │ │ │ ├── base_jaffle_shop__customers.sql
│ │ │ │ └── base_jaffle_shop__deleted_customers.sql
│ │ │ ├── stg_jaffle_shop__customers.sql
│ │ │ └── stg_jaffle_shop__orders.sql
│ │ └── stripe
│ │ ├── _stripe__models.yml
│ │ ├── _stripe__sources.yml
│ │ └── stg_stripe__payments.sql
│ └── utilities
│ └── all_dates.sql
├── packages.yml
├── snapshots
└── tests
└── assert_positive_value_for_total_amount.sql
Getting Started: Creating Your New dbt Project
The quickest and easiest way to get started with a new dbt project is with the init command. The dbt init command will begin setting up your profile and prompt for inputs.
Then, it will create a new folder with your project name and sample files, and create a connection profile on your local machine.
To create a new dbt project, execute the dbt init command in your terminal:
$ dbt init
03:03:53 Running with dbt=1.4.9
Enter a name for your project (letters, digits, underscore): linkedin_dbt_core
Which database would you like to use?
[1] sqlserver
(Don't see the one you want? https://docs.getdbt.com/docs/available-adapters)
Enter a number: 1
03:06:30 No sample profile found for sqlserver.
03:06:30
Your new dbt project "linkedin_dbt_core" was created!
For more information on how to configure the profiles.yml file,
please consult the dbt documentation here:
https://docs.getdbt.com/docs/configure-your-profile
One more thing:
Need help? Don't hesitate to reach out to us via GitHub issues or on Slack:
https://community.getdbt.com/
Happy modeling!
If you've already cloned or downloaded an existing dbt project, dbt init can still help you set up your connection profile so that you can start working quickly.
The command will prompt you for connection information, as above, and add a profile (using the profile name from the project) to your local profiles.yml, or create the file if it doesn't already exist.
Harnessing Git: Best Practices for Committing Changes
When finished, it's important to commit your changes, so that the repository contains the latest code.
To commit your changes, link to the GitHub repository you created for your dbt project by running the following commands in Terminal:
git init
git branch -M main
git add .
git commit -m "Create a dbt project"
git remote add origin https://github.com/USERNAME/dbt-project.git
git push -u origin main
Return to your GitHub repository to verify your new files have been added.
Now that you set up your dbt project, you can get to the fun part — building models!
Thanks for reading. Your feedback and suggestions are always welcome. Stay tuned for more engaging content ahead!