How to bring external data into the data warehouse#
Bring in external data involves writing some Python / Airflow code to create another DAG which extracts and loads your custom data source to the
raw schema in Snowflake, then writing custom dbt code to transform the data into something that can be added to the data warehouse.
- Add a custom DAG in your implementation repo using
airflow/dags/edfi_resources_to_snowflake.pyas a starting point. Modify it as needed to extract data from your custom source and load it into Snowflake.
- Write custom dbt models (at
dbt/models/in your project repo) to transform your custom data from the
rawSnowflake schema to dimensions and/or facts in the
analyticsschema. Use the
int_dbt_edfi_whrepo as a starting point, and follow the best practice of splitting your transformations into
- Consider writing dbt tests to validate your external data before building it into the warehouse.
We highly recommend testing the above changes in a new branch of your implementation repo and deploying it to a dev copy of infrastructure before merging into main.