Schemas in the data warehouse#

The default edu data warehouse has multiple databases and schemas. The framework can be extended and customized, so this article only describes the default setup.

Analytics database#

The anaytics database contains all of the tables and views built by dbt. Only dbt can create things in this database to control permissions and lineage in this database.

prod_wh#

The production data warehouse is the storage data schema for the analytic data model. The prefix prod_ distinguishes this from other testing environments and is the most up to date data. By default in this schema, you will find tables prefixed with dim_ for dimension tables, fct_ for fact tables, and msr_ for measure tables. More information on the dimensional model is available in this explanation article.

prod_seed#

These tables contain configurations to analytically interpret Ed-Fi data or align identifying information with common categories. These tables are maintained by the team managing dbt. More information about these tables is available in the manage section of the docs.

prod_dbt_test__audit#

dbt allows for testing of data to meet conditions and then storing records that do not pass tests for further inspection. Tables in this schema include rows from tests that are flagged as meeting some condition.

prod_qc#

This schema contains other dbt models that test or flag for data quality issues that are not directly tested by dbt.

prod_stage#

This schema contains unnested data coming from Ed-Fi, still generally in the terms of Ed-Fi. This is a good place to look to diagnose data quality issues between Ed-Fi and the data warehouse. We recommend avoiding building analytics or metrics off of these tables absent a good reason to do so.

Schemas prefixed with rc_ or dev_#

These schemas are testing environments for new features and duplicate the base part of the schema names of other schemas (stage, seed, wh, etc.)

Raw database#

edfi3 schema#

This schema contains all of the raw data loaded by the EDU code. By default, this includes Ed-Fi and extensions to Ed-Fi. The raw data is grouped by Ed-Fi resource and is stored in a schemaless field with the raw JSON and some metadata about its loading. We recommend looking at this data only if you are diagnosing a raw data issue between Ed-Fi and the data warehouse tables as it will be difficult to work with.