How to set up dbt Development#
The goal of this document is to explain the steps needed to set up a personal dbt development environment for use in an EDU project. The intended audience is a developer who is familiar with dbt and already has an account on an existing EDU data warehouse with a role that allows for write permissions & dbt development in the database.
This docs begins with the assumption that the reader has a local development environment set up with python/dbt (>version 1.0) already installed.
To install dbt, see the official documentation here.
dbt can be installed in any operating system, though the exact steps may be slightly different. As a prerequisite, you'll need a terminal client (such as Windows Terminal) and Python 3 installed.
EA developers use WSL to have a local Ubuntu virtual machine for development, which more closely emulates the deployed environment, but Windows can be used as well. Some of the documentation below assumes a Linux environment, so commands may need to adapted for Windows.
dbt profile Setup#
dbt requires a profile in your home directory to configure all connections you might use. Project level settings will choose the right connection, so you’ll only have to create set this up once per project.
If you don’t already have one, create a dbt profile in your home directory.
Add a profile to this file for the data warehouse with a text editor (e.g.
nano ~/.dbt/profiles.yml) . The format looks like this:
See the official docs for more details.
profile_namemust match the
profileparameter of the
dbt_project.ymlof the project you're working on
accountis the unique Snowflake account name, not URL, so whatever comes in the subdomain before
- The schema parameter should be of the form
dev_yourinitials(using your own initials, like
dev_ej). This is a recommended best practice for name-spacing development environments
- You can have multiple targets per project and select different targets in the CLI if this is necessary, but don't clutter the database with multiple schemas; clean up after yourself
- You will never use your local environment to write to reserved, operational schema names (such as
prod); these schemas will always be written to by deployed production servers only
- Make sure you are using a role that has permissions to write to the database, rather than just read. dbt needs a role that can create tables. You can see the roles available to you in the Snowflake UI. The standard role is called
Install package dependencies#
The core dbt code is set up as separate packages imported into a template "implementation" repository. This allows us to keep separate the centralized dbt models that are in use by all EDU projects, and create a dedicated space for implementation-specific dbt models layered on top of or alongside the core dbt models. To develop, you'll need to clone the implementation repository and install the imported packages.
The packages are available on github. Clone the project repo, and then install packages using the following commands:
dbt deps installs all packages specified in
packages.yml. It needs to be run when you first install the project, and any time you want to update to a new package version, but not before every run.
Add shell auto-completion for dbt#
By following these steps, you'll get tab-completion of model names.
# download the script to your home directory # note: this step assumes you have a folder in your home directory called .config curl https://raw.githubusercontent.com/fishtown-analytics/dbt-completion.bash/master/dbt-completion.bash > ~/.config/.dbt-completion.bash # install it to your profile echo 'source ~/.config/.dbt-completion.bash' >> ~/.bash_profile
In order to run dbt, you will need to do the following:
- Make sure that the implementation repository for your project is cloned locally
- Activate the python environment where dbt is installed
- Navigate to the dbt folder in the repository
If your profile is setup properly, you will then be able to run dbt. To determine if that is the case, you can run
dbt debug to assess connectivity issues.
From there you can build models in the
/models folder and use
ref() to refer to any models already in the warehouse by name.
As you write new code, use the
dbt run command to build the project to your development environment. You may need to run
dbt seed to create any seed tables present before the model will compile successfully. You can also run sub-sections of the DAG using node selection.
A typical workflow would be to write some new code, run it with
dbt run to make sure it compiles and runs successfully, look at the results in your dev schema to make sure it matches expectations, and then push working code to GitHub.
Test and store any changes made on new branch. We use
feature/ as a prefix to indicate a new feature proposal, and
bugfix/ to indicate a more urgent change to fix a problem. Branches can be pushed back to github, and then the developer can create a pull request for review to merge the code into the master branch.
Groups of features can then be merged to a
Release Candidate branch, where they can be previewed and tested against downstream tools, such as BI and dashboards. Once everything is working, they can be merged to production and deployed.