Skip to content

How to bring external data into the data warehouse#

Bring in external data involves writing some Python / Airflow code to create another DAG which extracts and loads your custom data source to the raw schema in Snowflake, then writing custom dbt code to transform the data into something that can be added to the data warehouse.

  1. Add a custom DAG in your implementation repo using airflow/dags/edfi_resources_to_snowflake.py as a starting point. Modify it as needed to extract data from your custom source and load it into Snowflake.
  2. Write custom dbt models (at dbt/models/ in your project repo) to transform your custom data from the raw Snowflake schema to dimensions and/or facts in the analytics schema. Use the models in the int_dbt_edfi_wh repo as a starting point, and follow the best practice of splitting your transformations into base, stage, (possibly conform,) and warehouse stages.
  3. Consider writing dbt tests to validate your external data before building it into the warehouse.

We highly recommend testing the above changes in a new branch of your implementation repo and deploying it to a dev copy of infrastructure before merging into main.