Introduction#

The student dimension contains all the basic information we know about a student; things like their name, birthdate, and demographic information.

It's also the most highly configurable part of the edu framework, so while its basic use and structure looks the same across implementations, exactly which demographics are defined and how they're defined can vary.

How is dim_student structured?#

Annualization#

The first thing to understand about dim_student is that it is annualized. This means that a student will have a row for each year they were enrolled.

This helps us capture how students change over the years-- whether it's their name, grade level, or program memberships-- allowing us to record historical metrics as the students were at the time, rather than as they are now.

Having one row per year is a bit simpler than the slowly changing dimension approach taken in other data warehouses, where a new row is added every time any characteristic changes. This reduction in detail makes the model simpler to work with, and filters out some of the noise or data corrections that happen throughout a school year. For example: if a student's name changes on March 1st, that's more likely to be a correction of the record than a meaningful change.

The student dimension has two keys at the far left of the table: k_student and k_student_xyear. A single student will have a new k_student every year, but the same k_student_xyear across all years. That lets us do some interesting things.

For example: if we join fct_student_grades using k_student, we will see their grades across all years related to their demographics from the same year they received those grades.

On the other hand, if we join using k_student_xyear, and filter to a particular school year in dim_student, like this:

select *
from fct_student_grades
join dim_student 
    on fct_student_grades.k_student_xyear = dim_student.k_student_xyear
where dim_student.school_year = 2023

We will see their grades from all years related to their demographics from 2023. This gives us the flexibility to reference demographics either way: as they were at the time, or as they are now.

Simplification#

Representing demographics in the student dimension often involves some simplification.

For example: a student doesn't have a grade level all by themselves. Rather, grade levels are a property of a student's enrollment in a particular school. It is possible (if rare) for a student to be enrolled in more than one grade at a time. For most analytical purposes though, it's reasonable to assume a student has a single grade level, and to want to use this in reporting or aggregations. Therefore the student dimension chooses a single grade level for each student to allow for easy analytics.

In all such cases the warehouse preserves this lost detail elsewhere in the data model: if you look in fct_student_school_association, you can find the grade level a student had in each of their enrollments throughout the year.

A more complex example is program enrollments: students may enter and exit programs like Special Education, English Language Learner, or Homelessness many times throughout the year. When this level of detail is required, the data is available in the various program-related fact tables. But for most analytical purposes we also want a simplified version of this data: was the student ever in the program this year? Are they in the program right now? Have they ever been in the program in any year? dim_student can capture any or all of these variations.

Where does the data come from?#

Most of the data in dim_student comes from Ed-Fi, and most of the data in Ed-Fi comes from the SIS. But Ed-Fi is very flexible in terms of how demographic data can be specified, so a full accounting depends on the specifics of which SIS you're using, how Ed-Fi is configured, and what extensions or other data sources have been added.

But within Ed-Fi there are a few common places:

Built-ins#

Many of the simpler demographics like gender, race/ethnicity, or birth date are built in to Ed-Fi's student or studentEducationOrganizationAssociation models, and edu uses these directly.

Characteristics and Indicators#

Ed-Fi allows for an extensible set of characteristics and indicators to be used. These can be customized, and many come standard with SIS integrations.

Characteristics are just text codes: a student either has the code or they don't. Standard codes include things like Economic Disadvantaged, Homeless, and Foster Care. edu unpacks these values into boolean indicators, like is_econ_disadvantaged, and these can be configured in edu's set-up.

Indicators are key-value pairs: there's an indicator like ELL Status which can take a range of different values. These are newer, so aren't in as wide of use yet, but can also be configured in edu's set-up, and can become boolean, numeric, or text status indicators.

Programs#

Ed-Fi can capture many different kinds of program enrollments, and these are summarized into a configurable set of _active, _annual, and/or _ever indicators, as described in the section above.

Some demographic labels could be specified as either a status or a program enrollment, which means they could come from a Characteristic/Indicator or a Program table (or both). Because Programs are like enrollments, with start and end dates, they capture a bit more detail and allow for more flexibility in recording how the status changed over time. But whether a program or indicator is more appropriate to use in dim_student varies by how your district thinks about these data and how the data is submitted to Ed-Fi. Comparing the data as it comes through Ed-Fi to existing analytics or other sources of truth can be instructive in making this determination.

Grade levels#

Grade levels come from student-school enrollments. If there are multiple options, the one with the longest duration is used.

Custom data sources#

Data from outside of Ed-Fi can be added as well, provided it can be reliably linked to a student

Custom interactions#

Custom combinations or interactions of other demographics can be created in edu, such as hispanic_and_ell or black_or_latinx.

Student Subgroup Fact#

It is often desirable to create high-level aggregates across many different demographic subgroups at once. To facilitate this, all student demographics are also stored "long" in fct_student_subgroup. This allows for things like creating an annual chronic absenteeism rate by school and all demographic subgroups in one relatively straightforward query:

select 
    msr.k_school,
    msr.school_year,
    ds.subgroup_category,
    ds.subgroup_value,
    sum(msr.is_chronic_absentee) as n_chronic_absent,
    count(*) n_total_students,
    100 * n_chronic_absent / n_total_students as pct_chronic_absentee
from msr_student_cumulative_attendance as msr 
join fct_student_subgroup fss
    on msr.k_student = fss.k_student 
join dim_subgroup ds 
    on fss.k_subgroup = ds.k_subgroup
group by 1,2,3,4;