How to bring external data into the data warehouse#
Bring in external data involves writing some Python / Airflow code to create another DAG which extracts and loads your custom data source to the raw
schema in Snowflake, then writing custom dbt code to transform the data into something that can be added to the data warehouse.
- Add a custom DAG in your implementation repo using
airflow/dags/edfi_resources_to_snowflake.py
as a starting point. Modify it as needed to extract data from your custom source and load it into Snowflake. - Write custom dbt models (at
dbt/models/
in your project repo) to transform your custom data from theraw
Snowflake schema to dimensions and/or facts in theanalytics
schema. Use themodels
in theint_dbt_edfi_wh
repo as a starting point, and follow the best practice of splitting your transformations intobase
,stage
, (possiblyconform
,) andwarehouse
stages. - Consider writing dbt tests to validate your external data before building it into the warehouse.
We highly recommend testing the above changes in a new branch of your implementation repo and deploying it to a dev copy of infrastructure before merging into main.