Skip to content

Dimensional Data Model#

A dimensional data model or star schema is a system of organization for how to structure an analytics database. The EDU data model aligns with this philosophy. The goal is to make it easy to understand how to use a data warehouse and find the data you are looking for without having to ask, if you know some of the key ideas.

In short the philosophy means we have a few types of tables in the warehouse:

  • Dimensions are "nouns" or "entities". In dimension tables, we are trying to describe everything we know about an entity in the columns of the table. For example, a school is an entity that we can describe with information like its name, physical address, and the district it is a part of. We exclude any information from this table that could require duplication of the entity to describe well so that we can keep the table to 1 row per entity described.

  • Facts are generally "observations" or "events" that we observe at a point in time. They usually bring together multiple dimensions; for example, a student attendance record could be thought of an observed event of the attendance of a student at a school on a day. A fact table about student attendance would contain combinations of codes assigned to students, schools, and days. To find out any contextual information about the event-- like the student's gender, the school's name, or what month the day was in-- we would join to the corresponding dimension table.

The combination of these entities creates the star in the name "star schema"-- a fact table can be thought of as the center of the star of several dimensions. Dimensional data models and star schemas are different ways of referring to the same idea.

star_schema

In the EDU data model, Dimension tables are usually have fewer rows and lots of columns, and Fact tables have fewer columns and many rows. The idea is that we are not repeating the same information about a dimension in every table referencing that dimension, and we are creating a consistent source of truth for information about that dimension.