Got slots free coins 2021,Slots full apk.REGISTER NOW GET FREE 888 PESOS REWARDS!

dbt? is an awesome SQL-first transformation workflow that lets teams quickly and collaboratively deploy analytics code following software engineering best practices like modularity, portability, CI/CD, and documentation.

With dbt, anyone on a data team can safely contribute to production-grade data pipelines seamlessly.

Getting Started: Setting Up for Success

This article is primarily for Windows operating systems, but will likely prove as a useful reference for those using various other OS software.

I've also added helpful links throughout this article, but make sure you are holding down the CTRL key before clicking, so that the link opens up in a new tab. At the time of this writing, it does not look like authors are able to add the HTML "_blank" target attribute to links.

Both Python and Git for Windows will need installed before you get started.

If you do not have Python installed on your local workstation, you can download and install in from the python.org website downloads page here.

Check that Python is installed and/or the release version with the following command in your command prompt or terminal:

$ python --version

If you don't already have Git SCM installed, now would be a good time to download and setup. Check out the Git online Reference Manual for The official and comprehensive man pages that are included in the Git package itself.

While you're at it, a good source-code editor like VS Code would probably be good to have installed as well.

Streamlining Your Workflow: Mastering Package Installation for Efficiency

First, you'll need to install dbt Core on your Windows system. You can do this using pip, the Python package manager.

pip is a package-management system written in Python, and it is the preferred installer program. Starting with Python 3.4, pip is included by default with the Python binary installers.

Install dbt Core by running the following in your command prompt or terminal:

$ python -m pip install \
  dbt-core \
  dbt-postgres \
  dbt-redshift \
  dbt-snowflake \
  dbt-bigquery \
  dbt-trino

Install only the above package(s) you need with their respective dbt-<adapter>, OR install them all.

If prompted to upgrade pip, execute the following command:

$ python.exe -m pip install --upgrade pip

I recommend verifying and keeping track of package versions, so that you are aware of what you're working with. Keeping track of software versions can prove to be extremely helpful when troubleshooting and working with various packages.

To upgrade dbt-core to the latest version via the terminal, you can use pip, the Python package manager.

Check the current version of dbt Core by running:

$ dbt --version
Core:
  - installed: 1.4.8
  - latest:    1.7.6 - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - sqlserver: 1.4.3 - Up to date!

All adapters build on top of dbt-core. Some also depend on other adapters: for example, dbt-redshift builds on top of dbt-postgres. In that case, you would see those adapters included by your specific installation, too.

To upgrade dbt Core to the latest version, use pip:

$ pip install --upgrade dbt-core

After the installation completes, you can verify that dbt-core has been upgraded to the latest version by running the dbt --version command.

Note that you may also have to upgrade the dbt-<adapter> packages that you have installed, as you might run into dependency conflicts like in the example below with the SQL Server package.

$ pip install --upgrade dbt-core
...
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
dbt-sqlserver 1.4.3 requires dbt-core~=1.4.5, but you have dbt-core 1.7.6 which is incompatible.
...

Navigate to your user directory (e.g. ~/User/YOUR-USERNAME/), and create a ~/.dbt/ folder. This is where you will create and store your profiles.yml file.

$ mkdir .dbt

Connection profiles

When developing locally, dbt connects to your Data Warehouse (DW) using a profile, which is a YAML file with all of the connection details to your DW.

To use dbt Core, you'll need a profiles.yml file that contains the connection details for your data platform.

In your profiles.yml file, you can store as many profiles as you need. Typically, you would have one profile for each warehouse you use. Most organizations only have one profile.

Note that dbt uses YAML in a few different places. If you're new to YAML, it would be worth learning how arrays, dictionaries, and strings are represented.

This file generally lives outside of your dbt project to avoid sensitive credentials being checked in to version control, but profiles.yml can be safely checked in when using environment variables to load sensitive credentials.

The YAML file below shows the configuration options in the profiles.yml file:

config:
  send_anonymous_usage_stats: <true | false>
  use_colors: <true | false>
  partial_parse: <true | false>
  printer_width: <integer>
  write_json: <true | false>
  warn_error: <true | false>
  warn_error_options: <include: all | include: [<error-name>] | include: all, exclude: [<error-name>]>
  log_format: <text | json | default>
  debug: <true | false>
  version_check: <true | false>
  fail_fast: <true | false>
  use_experimental_parser: <true | false>
  static_parser: <true | false>
  cache_selected_only: <true | false>
  printer_width: <integer>
  log_format: <text | json | default>

<profile-name>:
  target: <target-name> # this is the default target
  outputs:
    <target-name>:
      type: <bigquery | postgres | redshift | snowflake | other>
      schema: <schema_identifier>
      threads: <natural_number>

      ### database-specific connection details
      ...

    <target-name>: # additional targets
      ...

<profile-name>: # additional profiles
  ...

You may want to have your profiles.yml file stored in a different directory than ~/.dbt/ – for example, if you are using environment variables to load your credentials, you might choose to include this file in the root directory of your dbt project.

Note that the file always needs to be called profiles.yml, regardless of which directory it is in.

Unlocking Project Configurations: Optimizing Your Workflow with dbt

Every dbt project needs a dbt_project.yml file — this is how dbt knows a directory is a dbt project. It also contains important information that tells dbt how to operate your project.

By default, dbt will look for dbt_project.yml in your current working directory and its parents, but you can set a different directory using the --project-dir flag or the DBT_PROJECT_DIR environment variable.

The following example is a list of all available configurations in the dbt_project.yml file:

# Name your project. Project names should contain only lowercase 
# characters and underscores. A good package name should reflect 
# your organization's name or the intended use of these models
name: string
version: '1.3.0'
require-dbt-version: '>=1.3.0'
config-version: 2

# This setting configures which "profile" dbt uses for this project.
profile: profilename

# These configurations specify where dbt should look for 
# different types of files.
# The `model-paths` config, for example, states that models in 
# this project can be found in the "models/" directory. 
# You probably won't need to change these.
model-paths: [directorypath]
seed-paths: [directorypath]
test-paths: [directorypath]
analysis-paths: [directorypath]
macro-paths: [directorypath]
snapshot-paths: [directorypath]
docs-paths: [directorypath]
asset-paths: [directorypath]

target-path: directorypath
log-path: directorypath
packages-install-path: directorypath

clean-targets: [directorypath]

query-comment: string

require-dbt-version: version-range | [version-range]

dbt-cloud:
  project-id: project_id # Required
  defer-env-id: environment_id # Optional

quoting:
  database: true | false
  schema: true | false
  identifier: true | false

models:
  <model-configs>

seeds:
  <seed-configs>

snapshots:
  <snapshot-configs>

sources:
  <source-configs>
  
tests:
  <test-configs>

vars:
  <variables>

on-run-start: sql-statement | [sql-statement]
on-run-end: sql-statement | [sql-statement]

dispatch:
  - macro_namespace: packagename
    search_order: [packagename]

restrict-access: true | false

When you run dbt Core from the command line, it reads your dbt_project.yml file to find the profile name, and then looks for a profile with the same name in your profiles.yml file. This profile contains all the information dbt needs to connect to your data platform.

Later, you will see how you can utilize the dbt init command to easily generate both profiles.yml and dbt_project.yml files.

Creating The Repository

Create a repository in your github.com account. Alternatively, you can use the GitHub Desktop App if you prefer a GUI, want to simplify your development workflow, and/or find yourself fighting with Git on a regular basis.

Navigate to your Documents folder [or your preferred working directory for cloning repositories], and create a GitHub folder.

You can also create your working directory with the following command:

$ mkdir GitHub

Clone the repository you created earlier to your GitHub working directory.

To clone your repository, locally, using the command line using HTTPS, copy the URL displayed in GitHub Web with the HTTPS button selected.

In your CLI, type git clone, and then paste the URL you copied earlier. Your command will look similar to the following example:

$ git clone https://github.com/YOUR-USERNAME/YOUR-REPOSITORY

Now that you have cloned your GitHub repository to your local machine, you are ready to start structuring your project.

Crafting Successful dbt Projects: Best Practices and Strategies

Analytics engineering, at its core, is about helping groups of human beings collaborate on better decisions at scale.

Humans generally have limited bandwidth for making decisions.

Humans also, as a cooperative social species, rely on systems and patterns to optimize collaboration with others.

This combination of traits means that for collaborative projects it's crucial to establish consistent and comprehensible norms such that a team’s limited bandwidth for decision making can be spent on unique and difficult problems, not deciding where folders should go or how to name files.

Building a great dbt project is an inherently collaborative endeavor, bringing together domain knowledge from every department to map the goals and narratives of the entire company.

As such, it is especially important to establish a deep and broad set of patterns to ensure as many people as possible are empowered to leverage their particular expertise in a positive way, and to ensure that the project remains approachable and maintainable as your organization scales.

Steve Jobs and the Analogous Brilliance of dbt

Famously, Steve Jobs wore the same outfit everyday to reduce decision fatigue. You can think of this overview similarly, as a black turtleneck and New Balance sneakers for your company’s dbt project.

A dbt project’s power outfit, or more accurately its structure, is composed not of material fabric: but of files, folders, naming conventions, and programming patterns.

How you label things, group them, split them up, or bring them together — the system you use to organize the data transformations encoded in your dbt project — this is your project’s structure.

Structuring dbt projects

Structuring the files, folders, and models for our three primary layers in the models directory, which build on each other:

Staging — creating our atoms, our initial modular building blocks, from source data
Intermediate — stacking layers of logic with clear and specific purposes to prepare our staging models to join into the entities we want
Marts — bringing together our modular pieces into a wide, rich vision of the entities our organization cares about

Below is a complete file tree of the jaffle_shop sample project from dbt Docs, and a nice example of the default monolithic dbt structure:

jaffle_shop
├── README.md
├── analyses
├── seeds
│   └── employees.csv
├── dbt_project.yml
├── macros
│   └── cents_to_dollars.sql
├── models
│   ├── intermediate
│   │   └── finance
│   │       ├── _int_finance__models.yml
│   │       └── int_payments_pivoted_to_orders.sql
│   ├── marts
│   │   ├── finance
│   │   │   ├── _finance__models.yml
│   │   │   ├── orders.sql
│   │   │   └── payments.sql
│   │   └── marketing
│   │       ├── _marketing__models.yml
│   │       └── customers.sql
│   ├── staging
│   │   ├── jaffle_shop
│   │   │   ├── _jaffle_shop__docs.md
│   │   │   ├── _jaffle_shop__models.yml
│   │   │   ├── _jaffle_shop__sources.yml
│   │   │   ├── base
│   │   │   │   ├── base_jaffle_shop__customers.sql
│   │   │   │   └── base_jaffle_shop__deleted_customers.sql
│   │   │   ├── stg_jaffle_shop__customers.sql
│   │   │   └── stg_jaffle_shop__orders.sql
│   │   └── stripe
│   │       ├── _stripe__models.yml
│   │       ├── _stripe__sources.yml
│   │       └── stg_stripe__payments.sql
│   └── utilities
│       └── all_dates.sql
├── packages.yml
├── snapshots
└── tests
    └── assert_positive_value_for_total_amount.sql

Getting Started: Creating Your New dbt Project

The quickest and easiest way to get started with a new dbt project is with the init command. The dbt init command will begin setting up your profile and prompt for inputs.

Then, it will create a new folder with your project name and sample files, and create a connection profile on your local machine.

To create a new dbt project, execute the dbt init command in your terminal:

$ dbt init
03:03:53  Running with dbt=1.4.9
Enter a name for your project (letters, digits, underscore): linkedin_dbt_core
Which database would you like to use?
[1] sqlserver

(Don't see the one you want? https://docs.getdbt.com/docs/available-adapters)

Enter a number: 1
03:06:30  No sample profile found for sqlserver.
03:06:30  
Your new dbt project "linkedin_dbt_core" was created!

For more information on how to configure the profiles.yml file,
please consult the dbt documentation here:

  https://docs.getdbt.com/docs/configure-your-profile

One more thing:

Need help? Don't hesitate to reach out to us via GitHub issues or on Slack:

  https://community.getdbt.com/

Happy modeling!

If you've already cloned or downloaded an existing dbt project, dbt init can still help you set up your connection profile so that you can start working quickly.

The command will prompt you for connection information, as above, and add a profile (using the profile name from the project) to your local profiles.yml, or create the file if it doesn't already exist.

Harnessing Git: Best Practices for Committing Changes

When finished, it's important to commit your changes, so that the repository contains the latest code.

To commit your changes, link to the GitHub repository you created for your dbt project by running the following commands in Terminal:

git init
git branch -M main
git add .
git commit -m "Create a dbt project"
git remote add origin https://github.com/USERNAME/dbt-project.git
git push -u origin main

Return to your GitHub repository to verify your new files have been added.

Now that you set up your dbt project, you can get to the fun part — building models!

Thanks for reading. Your feedback and suggestions are always welcome. Stay tuned for more engaging content ahead!

Setting Up Your Local Machine for dbt Core: A Comprehensive Guide

Alex Migit ??

Accredited [I]nvestor | 10X Mindset | Athlete | Sr. Data Engineer | Serial Entrepreneur | Musician | Creator | Impact Driven | You Are Limitless

Getting Started: Setting Up for Success

Streamlining Your Workflow: Mastering Package Installation for Efficiency

Connection profiles

Unlocking Project Configurations: Optimizing Your Workflow with dbt

领英推荐

Creating The Repository

Crafting Successful dbt Projects: Best Practices and Strategies

Steve Jobs and the Analogous Brilliance of dbt

Structuring dbt projects

Getting Started: Creating Your New dbt Project

Harnessing Git: Best Practices for Committing Changes

更多精彩文章

社区洞察

其他会员也浏览了

7 Pieces of Advice After 5+ Years of Using SQL & Python to Solve Complex Problems

Code Generation Engine Concept

Harnessing Python to Tackle Modern Challenges: An In-Depth Guide for CTOs

40 intresting Python packages; Not necessarily the most popular one

Python’s Collections Module: Unlocking Powerful Data Structures

Supercharge your development with Python's fastest web framework "FastAPI"

Exploring Streamlit: A Powerful Framework for Building Data Applications

Python_FTP_Server_and_Client : Distributed Log File Automated Archive System

Harnessing Prompt Engineering to Build a FHIR Server in Python

Part I: Python and the Oracle Autonomous Database - Three Ways to Connect

Getting Started: Setting Up for Success

Streamlining Your Workflow: Mastering Package Installation for Efficiency

Connection profiles

Unlocking Project Configurations: Optimizing Your Workflow with dbt

领英推荐

Creating The Repository

Crafting Successful dbt Projects: Best Practices and Strategies

Steve Jobs and the Analogous Brilliance of dbt

Structuring dbt projects

Getting Started: Creating Your New dbt Project

Harnessing Git: Best Practices for Committing Changes

Unlocking Efficiency: Why Every Data Engineer Should Master Virtual Environments

2024年11月20日

Building a Scalable Snowflake ETL Pipeline with Fivetran, dbt Core, VS Code, and GitHub

2024年4月30日

Unlocking Cross-Platform Connectivity: Accessing SQL Server on Your Mac with Azure Data Studio

2024年4月23日

Simplifying Grid State Persistence in .NET (MVVM) Web Applications w/ Kendo UI: A Step-by-Step Guide

2024年4月15日

Decoding SQL: A Concise Historical Guide to Pronunciation

2024年1月27日

Upgrade MariaDB 5.5 to Version 10.2 on CentOS 7 (Core)

2020年2月16日

How To Redirect a WWW Subdomain Name To a Root Domain With Amazon S3

2020年2月10日

How To Deploy a High-Availability WordPress Website with External Amazon RDS Database

2020年1月30日

Upgrade Windows Server 2008 R2 Enterprise to Windows Server 2012 R2 Standard

2019年12月27日

Demand Generation, and Why You Need it.

2018年1月16日

社区洞察

其他会员也浏览了

7 Pieces of Advice After 5+ Years of Using SQL & Python to Solve Complex Problems

Code Generation Engine Concept

Harnessing Python to Tackle Modern Challenges: An In-Depth Guide for CTOs

40 intresting Python packages; Not necessarily the most popular one

Python’s Collections Module: Unlocking Powerful Data Structures

Supercharge your development with Python's fastest web framework "FastAPI"

Exploring Streamlit: A Powerful Framework for Building Data Applications

Python_FTP_Server_and_Client : Distributed Log File Automated Archive System

Harnessing Prompt Engineering to Build a FHIR Server in Python

Part I: Python and the Oracle Autonomous Database - Three Ways to Connect