Skip to content

How to set up dbt Development#

The goal of this document is to explain the steps needed to set up a personal dbt development environment for use in an EDU project. The intended audience is a developer who is familiar with dbt and already has an account on an existing EDU data warehouse with a role that allows for write permissions & dbt development in the database.

This docs begins with the assumption that the reader has a local development environment set up with Python installed. The exact version of Python is not critical, but the minimum compatible version is 3.8.

Installing dbt#

To install dbt, see the official documentation here. You'll need dbt-core and dbt-snowflake installed, but no other adapters are required.

dbt can be installed in any operating system, though the exact steps may be slightly different. As a prerequisite, you'll need a terminal client (such as Windows Terminal) and Python 3 installed.

EA developers use WSL to have a local Ubuntu virtual machine for development, which more closely emulates the deployed environment, but Windows can be used as well. Some of the documentation below assumes a Linux environment, so commands may need to adapted for Windows.

dbt profile Setup#

dbt requires a profile in your home directory to configure all connections you might use. Project level settings will choose the right connection, so you’ll only have to create set this up once per project.

If you don’t already have one, create a dbt profile in your home directory.

cd ~
mkdir .dbt
touch ~/.dbt/profiles.yml

Add a profile to this file for the data warehouse with a text editor (e.g. nano ~/.dbt/profiles.yml) . The format looks like this:

Hint

If your password has special characters, enclose it in single quotes in your profile.

# snowflake format
profile_name:
  target: dev
  outputs:
    dev:
      type: snowflake
      account:
      user:
      password:
      role: transformer_dev
      database: dev_analytics
      warehouse: transforming_dev
      schema: dev_{yourinitials}
      threads: 8
      client_session_keep_alive: False

See the official docs for more details, and ask your administrators if the roles are named differently in your implementation.

Notes:

  • profile_name must match the profile parameter of the dbt_project.yml of the project you're working on
  • account is the unique Snowflake account name, not URL, so whatever comes in the subdomain before .snowflakecomputing.com
  • The schema parameter should be of the form dev_yourinitials (using your own initials, like dev_ej). This is a recommended best practice for name-spacing development environments
  • You can have multiple targets per project and select different targets in the CLI if this is necessary, but don't clutter the database with multiple schemas; clean up after yourself
  • You will never use your local environment to write to reserved, operational schema names (such as prod ); these schemas will always be written to by deployed production servers only
  • Make sure you are using a role that has permissions to write to the database, rather than just read. dbt needs a role that can create tables. You can see the roles available to you in the Snowflake UI. The standard role is called transformer_dev

Install package dependencies#

The core dbt code is set up as separate packages imported into a template "implementation" repository. This allows us to keep separate the centralized dbt models that are in use by all EDU projects, and create a dedicated space for implementation-specific dbt models layered on top of or alongside the core dbt models. To develop, you'll need to clone the implementation repository and install the imported packages.

Configuring Git#

If your EDU project is in a private repository, you may need to configure your git credentials. If you already have git configured, you can skip this step.

You'll first need to create a Personal Access Token.

Then create a .git-credentials file in your terminal's home directory, and put the following inside it (replacing the capitalized parts):

https://YOUR_GITHUB_USERNAME:YOUR_NEW_TOKEN@github.com

Then tell configure git to use these credentials:

git config --global credential.helper 'store'

Installing the project#

Clone the project repo, and then install packages using the following commands:

# make a code folder
cd ~
mkdir code
cd code
# clone the repository
git clone https://github.com/edanalytics/stadium_{project_name}.git

# go to the dbt folder in the repo
cd stadium_{project_name}/dbt
# run dbt deps to install packages
dbt deps

dbt deps installs all packages specified in packages.yml. It needs to be run when you first install the project, and any time you want to update to a new package version, but not before every run.

Add shell auto-completion for dbt#

By following these steps, you'll get tab-completion of model names.

# download the script to your home directory
# note: this step assumes you have a folder in your home directory called .config
curl https://raw.githubusercontent.com/fishtown-analytics/dbt-completion.bash/master/dbt-completion.bash > ~/.config/.dbt-completion.bash

# install it to your profile
echo 'source ~/.config/.dbt-completion.bash' >> ~/.bash_profile

dbt Development#

In order to run dbt, you will need to do the following:

  • Make sure that the implementation repository for your project is cloned locally
  • Activate the python environment where dbt is installed
  • Navigate to the dbt folder in the repository

If your profile is setup properly, you will then be able to run dbt. To determine if that is the case, you can run dbt debug to assess connectivity issues.

From there you can build models in the /models folder and use ref() to refer to any models already in the warehouse by name.

Development workflow#

As you write new code, use the dbt run command to build the project to your development environment. You may need to run dbt seed to create any seed tables present before the model will compile successfully. You can also run sub-sections of the DAG using node selection.

A typical workflow would be to write some new code, run it with dbt run to make sure it compiles and runs successfully, look at the results in your dev schema to make sure it matches expectations, and then push working code to GitHub.

Test and store any changes made on new branch. We use feature/ as a prefix to indicate a new feature proposal, and bugfix/ to indicate a more urgent change to fix a problem. Branches can be pushed back to github, and then the developer can create a pull request for review to merge the code into the master branch.

Groups of features can then be merged to a Release Candidate branch, where they can be previewed and tested against downstream tools, such as BI and dashboards. Once everything is working, they can be merged to production and deployed.