How to set up dbt Development#
The goal of this document is to explain the steps needed to set up a personal dbt development environment for use in an EDU project. The intended audience is a developer who is familiar with dbt and already has an account on an existing EDU data warehouse with a role that allows for write permissions & dbt development in the database.
This docs begins with the assumption that the reader has a local linux development environment set up with python/dbt (>version 1.0) already installed.
dbt profile Setup#
dbt requires a profile in your home directory to configure all connections you might use. Project level settings will choose the right connection, so you’ll only have to create set this up once per project.
If you don’t already have one, create a dbt profile in your home directory.
Add a profile to this file for the data warehouse with a text editor (e.g.
nano ~/.dbt/profiles.yml) . The format looks like this:
# snowflake format profile_name: target: dev outputs: dev: type: snowflake account: user: password: role: database: warehouse: schema: dev_yourinitials threads: 2 client_session_keep_alive: False
See the official docs for more details.
profile_namemust match the
profileparameter of the
dbt_project.ymlof the project you're working on
accountis the unique Snowflake account name, no URL, so whatever comes in the subdomain before
- The schema parameter should be of the form
dev_yourinitials(except using your own initials). This is a recommended best practice for name-spacing in development environments
- You can have multiple targets per project and select different targets in the CLI if this is necessary, but don't clutter the database with multiple schemas; clean up after yourself
- You will never use your local environment to write to reserved, operational schema names (such as
prod); these schemas will always be written to by shared Airflow machines
- Make sure you are using a role that has permissions to _write _to the database, rather than just read. dbt needs a role that can create tables. You can see the roles available to you in the Snowflake UI.
Install package dependencies#
The core dbt code is set up as separate packages imported into a template "implementation" repository. This allows us to keep separate and version the centralized dbt models that are in use by all EDU projects, and create a dedicated space for implementation-specific dbt models layered on top of or alongside the core dbt models. To develop, you'll need to clone both the implementation repository and install the imported packages.
The packages are available on github. Clone and then install using the following commands (todo when we open up these packages give instructions for installing them)
Add shell auto-completion for dbt#
By following these steps, you'll get tab-completion of model names.
# download the script to your home directory # note: this step assumes you have a folder in your home directory called .config curl https://raw.githubusercontent.com/fishtown-analytics/dbt-completion.bash/master/dbt-completion.bash > ~/.config/.dbt-completion.bash # install it to your profile echo 'source ~/.config/.dbt-completion.bash' >> ~/.bash_profile
In order to run dbt, you will need to do the following:
- Make sure that the implementation repository for your project is cloned locally
- Activate the python environment where dbt is downloaded
- Navigate to the dbt folder in the repository
If your profile is setup properly, you will then be able to run dbt. To determine if that is the case, you can run:
From there you can build models in the
/models folder and use
ref() to refer to any models already in the warehouse by name. You can reference those models by looking at the code in the other dbt packages in github.
Test and store any changes made on new branch. We use
feature/ as a prefix to indicate a new feature proposal, and
bugfix/ to indicate a more urgent change to fix a problem. Branches can be pushed back to github, and then the developer can create a pull request for review to merge the code into the master branch.
We recommend that implementations initiate a step between individual development testing and production where multiple branches are consolidated and run against production data but not in production. Sometimes we call this a
release candidate. This can be helpful for coordinating breaking changes with downstream dependencies like dashboards.