How to set up dbt Development#
The goal of this document is to explain the steps needed to set up a personal dbt development environment for use in an EDU project. The intended audience is a developer who is familiar with dbt and already has an account on an existing EDU data warehouse with a role that allows for write permissions & dbt development in the database.
This docs begins with the assumption that the reader has a local development environment set up with Python installed. The exact version of Python is not critical, but the minimum compatible version is 3.8.
Installing dbt#
To install dbt, see the official documentation here.
You'll need dbt-core
and dbt-snowflake
installed, but no other adapters are required.
dbt can be installed in any operating system, though the exact steps may be slightly different. As a prerequisite, you'll need a terminal client (such as Windows Terminal) and Python 3 installed.
EA developers use WSL to have a local Ubuntu virtual machine for development, which more closely emulates the deployed environment, but Windows can be used as well. Some of the documentation below assumes a Linux environment, so commands may need to adapted for Windows.
dbt profile Setup#
dbt requires a profile in your home directory to configure all connections you might use. Project level settings will choose the right connection, so you’ll only have to create set this up once per project.
If you don’t already have one, create a dbt profile in your home directory.
Add a profile to this file for the data warehouse with a text editor (e.g. nano ~/.dbt/profiles.yml
) . The format looks like this:
Hint
If your password has special characters, enclose it in single quotes in your profile.
# snowflake format
profile_name:
target: dev
outputs:
dev:
type: snowflake
account:
user:
password:
role: transformer_dev
database: dev_analytics
warehouse: transforming_dev
schema: dev_{yourinitials}
threads: 8
client_session_keep_alive: False
See the official docs for more details, and ask your administrators if the roles are named differently in your implementation.
Hint
If you are using MFA, be sure to add this to your profile: authenticator: username_password_mfa
to avoid multiple MFA requests on each run.
Notes:
profile_name
must match theprofile
parameter of thedbt_project.yml
of the project you're working onaccount
is the unique Snowflake account name, not URL, so whatever comes in the subdomain before.snowflakecomputing.com
- The schema parameter should be of the form
dev_yourinitials
(using your own initials, likedev_ej
). This is a recommended best practice for name-spacing development environments - You can have multiple targets per project and select different targets in the CLI if this is necessary, but don't clutter the database with multiple schemas; clean up after yourself
- You will never use your local environment to write to reserved, operational schema names (such as
prod
); these schemas will always be written to by deployed production servers only - Make sure you are using a role that has permissions to write to the database, rather than just read. dbt needs a role that can create tables. You can see the roles available to you in the Snowflake UI. The standard role is called
transformer_dev
Install package dependencies#
The core dbt code is set up as separate packages imported into a template "implementation" repository. This allows us to keep separate the centralized dbt models that are in use by all EDU projects, and create a dedicated space for implementation-specific dbt models layered on top of or alongside the core dbt models. To develop, you'll need to clone the implementation repository and install the imported packages.
Configuring Git#
If your EDU project is in a private repository, you may need to configure your git credentials. If you already have git configured, you can skip this step.
You'll first need to create a Personal Access Token.
Then create a .git-credentials
file in your terminal's home directory, and put
the following inside it (replacing the capitalized parts):
Then tell configure git to use these credentials:
Installing the project#
Clone the project repo, and then install packages using the following commands:
# make a code folder
cd ~
mkdir code
cd code
# clone the repository
git clone https://github.com/edanalytics/stadium_{project_name}.git
# go to the dbt folder in the repo
cd stadium_{project_name}/dbt
# run dbt deps to install packages
dbt deps
dbt deps
installs all packages specified in packages.yml
. It needs to be run when you first install the project, and any time you want to update to a new package version, but not before every run.
Add shell auto-completion for dbt#
By following these steps, you'll get tab-completion of model names.
# download the script to your home directory
# note: this step assumes you have a folder in your home directory called .config
curl https://raw.githubusercontent.com/fishtown-analytics/dbt-completion.bash/master/dbt-completion.bash > ~/.config/.dbt-completion.bash
# install it to your profile
echo 'source ~/.config/.dbt-completion.bash' >> ~/.bash_profile
dbt Development#
In order to run dbt, you will need to do the following:
- Make sure that the implementation repository for your project is cloned locally
- Activate the python environment where dbt is installed
- Navigate to the dbt folder in the repository
If your profile is setup properly, you will then be able to run dbt. To determine if that is the case, you can run dbt debug
to assess connectivity issues.
From there you can build models in the /models
folder and use ref()
to refer to any models already in the warehouse by name.
Development workflow#
As you write new code, use the dbt run
command to build the project to your development environment. You may need to run dbt seed
to create any seed tables present before the model will compile successfully. You can also run sub-sections of the DAG using node selection.
A typical workflow would be to write some new code, run it with dbt run
to make sure it compiles and runs successfully, look at the results in your dev schema to make sure it matches expectations, and then push working code to GitHub.
Test and store any changes made on new branch. We use feature/
as a prefix to indicate a new feature proposal, and bugfix/
to indicate a more urgent change to fix a problem. Branches can be pushed back to github, and then the developer can create a pull request for review to merge the code into the master branch.
Groups of features can then be merged to a Release Candidate
branch, where they can be previewed and tested against downstream tools, such as BI and dashboards. Once everything is working, they can be merged to production and deployed.