Demographics Query Guide#
Introduction#
This is a query guide for using the EDU data warehouse to assemble datasets with demographics data for answering analytics questions.
This guide aims to:
- Provide key information on the relevant tables and variables available in EDU
- Outline key pre-processing steps implemented on the relevant tables
- Demonstrate common use cases that should help you get started on:
- Exploring data
- Assembling datasets containing variables that are sourced from different tables
- When to use long vs. wide demographics
- Composing more complex queries
Important notes
🔧 Implementation variations: This documentation is based on a standard implementation of EDU. Site-specific implementations may have additional data processing steps and features built out that this document does not cover. Please consult the dbt Docs of your specific implementation for more details.
📚 Prerequisites: This document assumes knowledge of SQL and common database concepts such as dimensional modeling.
💡 Learning approach: Examining the common use cases below should help you think critically about the factors to consider when combining demographics data with other types of student data available in EDU.
Which tables contain demographics data?#
EDU contains demographics data for students, staff, and other kinds of people (contacts, candidates, etc.). Demographic information often appears across multiple tables, so choosing the right one depends on both the specific data you need and which table structure will be most efficient for your analysis.
This section will walk you through all EDU tables containing demographics data. It will also highlight some notable variables in each of these tables.
Note that this section includes links to a general EDU dbt Docs site. It is likely most helpful to consult the dbt Docs generated for your specific implementation. It may be the case that your implementation has additional demographics tables.
Student Demographics Tables#
-
- This is arguably the most used table in the warehouse. This table identifies students and their demographics/characteristics, in a wide annual format.
-
Annualized: One row per student per school year
-
Annualizing dim_student was an intentional choice. It enables longitudinal analysis with point-in-time annual demographics. For example, you can easily find 'What was student A's Multilingual status as of 2024-2025?' (see point-in-time demographics) and 'How has school A's attendance rate among multilingual learners changed over time? (see time-series analysis).
-
That said, some demographics are better treated as 'immutable' (constant across years). By default, EDU treats name, birth info, gender, race, and ethnicity as immutable, meaning the most recent values for these fields are applied retroactively across all years in
dim_student.
-
Descriptions of some useful columns in dim_student
student_unique_id: The student's unique identifier, typically assigned by the state or district. This is the natural key for identifying a student across years.k_student: The unique key of this table. Surrogate key forstudent_unique_id+school_year.k_student_xyear: "student cross-year". Surrogate key forstudent_unique_id, to be used to identify students across years (see longitudinal analysis).display_name: The student's name, in a display-friendly format.safe_display_name: The student's display name, with their unique identifier added. This is useful for BI tools to display student names in a way that's ensured to be unique (and enables tools to safely link information across tables).grade_level: The student's grade level. By default, EDU determines this by taking the latest grade level from the student-year's enrollment data infct_student_school_association.race_ethnicity: A single value defining the student's race/ethnicity, which includes logic for multi-racial students and Hispanic or Latino students.is_special_education_active: True/False indicator for whether the student is actively enrolled in a special education program. Based onstudent_special_education_program_associations.is_special_education_annual: True/False indicator for whether the student was enrolled in a special education program at any point during the given school year. Based onstudent_special_education_program_associations.- ... ^ these indicators are repeated for other types of programs, as well
-
- This table provides the same information as
dim_student, but in a long format, which can be useful for building visualizations in a BI tool (see wide vs. long format). - One row per student per subgroup per year.
k_subgroupis a foreign key todim_subgroup.
- This table provides the same information as
-
- This dimension table stores the index of demographic 'subgroups' that are represented in
fct_student_subgroup. - Unique key is
k_subgroup, a surrogate key for [subgroup_category+subgroup_value]. - This table is automatically generated based on the existing demographic columns + values found in
dim_student xwalk_subgroup_category_display_namesandxwalk_subgroup_value_display_namescan be used to override the text ofsubgroup_category_display_nameandsubgroup_value_display_name. This allows implementations to create BI-friendly display names for each subgroup.
- This dimension table stores the index of demographic 'subgroups' that are represented in
-
Program Tables (e.g., fct_student_special_education_program_association)
- Detailed program participation (with start/end dates).
- Availability and structure depend on your EDU implementation.
- Use for program-specific analysis (e.g., special education, ELL). See program association details for an example.
Staff Demographic Tables#
- dim_staff
- Similar to
dim_student, this table identifies staffs and their demographics/characteristics. - Unlike
dim_student, this table is not annualized. There is one record per staff member across all years.- This design choice reflects how staff data are typically used in education data contexts. While student data are often analyzed historically at specific points in time, staff data are primarily used for current-year purposes like application security/access control and headcount reporting. Staff demographics also change less frequently year-over-year, except for cumulative metrics (such as total years of teaching experience) where tracking across time is actually more valuable.
- Similar to
Parent/Contact Demographic Tables#
- dim_parent
- Similar to
dim_student, this table identifies parents (or guardians, other contacts associated with students) and their demographics/characteristics/contact info. - Unlike
dim_student, this table is not annualized. There is one record per staff member across all years.- This design choice reflects how parent/contact data are typically used in education data contexts. While student data are often analyzed historically at specific points in time, parent/contact data are primarily used for current-year purposes, namely keeping an up-to-date record of contact information.
- Similar to
What processing has already been done?#
This section walks through the pre-processing done on the demographics data before it's made available to the tables mentioned earlier.
What validation and/or drops have been done?#
For a student to be included in dim_student, they must have an Student record in the Ed-Fi ODS for the given year, as well as a StudentEducationOrganizationAssociation record in the same ODS-year.
Every student found in dim_student will also exist in fct_student_subgroup.
For a staff to be included in dim_staff, they must have an Staff record in the Ed-Fi ODS for the given year.
For a parent/contact to be included in dim_parent, they must have an Contact record in the Ed-Fi ODS for the given year.
Otherwise, EDU does not drop any records from these demographic tables.
dim_student Configuration#
dim_student is a highly configurable table. Implementation owners may configure new demographic columns through the addition of program indicators, student indicators, student characteristics, or custom data sources. If you are looking for a demographic column and can't find it in dim_student, speak to the team responsible for maintaining your implementation of EDU. It's very likely there is a way for them to add it! See Configuring dbt Project Variables for more details on how to configure dim_student.
dim_subgroup Configuration#
dim_subgroup's subgroup_category_display_name and subgroup_value_display_name can be configured using seed tables xwalk_subgroup_category_display_names and xwalk_subgroup_value_display_names. If the display names you see here are incorrect or unfriendly to users, ask the team responsible for maintaining your implementation of EDU to update these xwalks for you.
Common use cases#
The queries listed in this section should help you get started with exploring demographics data available in EDU and assembling datasets for common analytic use cases. These examples focus on student demographic data, though similar patterns can be applied to staff and parent/contact demographics. The first few use cases will walk you through how to query demographics data from different tables. The later use cases will address when to use wide vs. long format demographics and how to combine demographics with other types of student data.
Placeholder values
Note that in the queries provided in this section, values used for filtering using the WHERE clause may be placeholder values. Make sure to replace these with the actual values of the columns you wish to filter using.
Quick reference#
| Use Case | Description | Complexity |
|---|---|---|
| Point-in-time demographics | Find student demographics for a specific school year | Beginner |
| Longitudinal analysis | Track students across years using k_student_xyear | Beginner |
| Program association details | Explore detailed program enrollment information | Beginner |
| Wide vs. long format | Choose between dim_student and fct_student_subgroup | Intermediate |
| Join demographics to other data | Combine demographics with attendance, grades, etc. | Intermediate |
| Time-series analysis | Track how demographic groups change over time | Advanced |
How to find student demographics as of a specific school year?#
What you'll learn: How to query point-in-time annual demographics from dim_student.
Since dim_student is annualized (one row per student per school year), you can easily find a student's demographics as they were recorded for a specific school year. This is useful for questions like "What was student A's Multilingual status as of 2024-2025?"
-- Find student demographics for a specific school year
SELECT
dim_student.school_year,
dim_student.k_student_xyear,
dim_student.safe_display_name,
dim_student.grade_level,
dim_student.race_ethnicity,
dim_student.gender,
dim_student.is_special_education_annual,
dim_student.is_english_language_learner_annual
FROM analytics.prod_wh.dim_student
WHERE dim_student.school_year = 2025
AND dim_student.k_student_xyear = 'YOUR_STUDENT_KEY'; -- replace with actual student key
-- You can also filter on student name or student_unique_id, if you prefer.
-- Note: If you have multi-tenant access, filtering by student_unique_id or name
-- may return multiple records across tenants. Consider adding a tenant filter
-- or using surrogate keys for unambiguous identification.
Annual demographics
The annualized structure of dim_student allows you to see how a student's demographics may have changed from year to year. For example, a student might be in special education one year but not the next, and this will be reflected in separate rows for each school year.
How to track student demographics across multiple years?#
What you'll learn: How to use k_student_xyear to track the same student across multiple school years.
The k_student_xyear key remains constant for a student across all years, making it ideal for longitudinal analysis. This allows you to track how a student's demographics or characteristics change over time.
-- Track a student's demographics across multiple years
SELECT
dim_student.school_year,
dim_student.k_student_xyear,
dim_student.safe_display_name,
dim_student.grade_level,
dim_student.is_special_education_annual,
dim_student.is_english_language_learner_annual
FROM analytics.prod_wh.dim_student
WHERE dim_student.k_student_xyear = 'YOUR_STUDENT_KEY' -- replace with actual student key
ORDER BY dim_student.school_year;
Using k_student_xyear
Use k_student_xyear when you need to join demographics to other tables that may not have k_student available for all years, or when you want to track students across years regardless of whether they have a dim_student record for every year.
De-duplication considerations: When joining using k_student_xyear, be aware that you may need to de-duplicate results if a student has multiple records per year (e.g., attended multiple schools). See How to join demographics data to other student data below and the Assessment Query Guide for examples of handling these scenarios.
How to explore a student's program association details?#
What you'll learn: How to query detailed program enrollment information from program association tables, using special education as an example.
While dim_student provides summary indicators like is_special_education_annual, the program association tables contain richer detail about program enrollments—including start/end dates, disability types, service hours, and IEP information. This is useful for questions like "What are the details of student A's special education program enrollment?"
-- Explore a student's special education program details
SELECT
dim_student.safe_display_name,
fct_prog.school_year,
fct_prog.program_enroll_begin_date,
fct_prog.program_enroll_end_date,
fct_prog.primary_disability_type,
fct_prog.special_education_setting,
fct_prog.is_idea_eligible,
fct_prog.iep_begin_date,
fct_prog.iep_end_date,
fct_prog.school_hours_per_week,
fct_prog.spec_ed_hours_per_week
FROM analytics.prod_wh.fct_student_special_education_program_association fct_prog
JOIN analytics.prod_wh.dim_student
ON fct_prog.k_student = dim_student.k_student
WHERE dim_student.k_student_xyear = 'YOUR_STUDENT_KEY' -- replace with actual student key
ORDER BY fct_prog.school_year, fct_prog.program_enroll_begin_date;
Program association tables
EDU includes several program association tables beyond special education (e.g., for ELL programs, Title I, etc.). The availability and structure of these tables depend on your EDU implementation. Check your implementation's dbt Docs for the full list of available program tables.
Multiple enrollments: A student can have multiple program enrollments per year (e.g., if they exited and re-enrolled in a program). Each enrollment will appear as a separate row with its own begin/end dates.
When to use wide vs. long format demographics?#
What you'll learn: How to choose between dim_student (wide) and fct_student_subgroup (long) based on your analysis needs.
Use dim_student (wide format) when:
- You need multiple demographic columns in a single row
- You're doing aggregations or calculations that benefit from having all demographics in one row
- You're joining to other fact tables and want to keep the grain simple
- You need point-in-time annual demographics
Use fct_student_subgroup (long format) when:
- You're building BI visualizations that need flexible demographic filtering
- You want to pivot or filter by different demographic categories dynamically
- You're doing subgroup-based aggregations (e.g., "count of students by each demographic category")
- You need a standardized structure for demographic analysis
- You're working with multiple demographic dimensions that would create many columns in wide format
Both dim_student and fct_student_subgroup are annualized (one row per student per year), so either can be used for point-in-time annual demographics.
-- Example: Wide format (dim_student) - all demographics in one row
SELECT
dim_student.school_year,
dim_student.safe_display_name,
dim_student.race_ethnicity,
dim_student.gender,
dim_student.is_special_education_annual,
dim_student.is_english_language_learner_annual
FROM analytics.prod_wh.dim_student
WHERE dim_student.school_year = 2025
LIMIT 1000;
-- Example: Long format (fct_student_subgroup) - one row per demographic
SELECT
fct_student_subgroup.school_year,
dim_student.safe_display_name,
dim_subgroup.subgroup_category_display_name,
dim_subgroup.subgroup_value_display_name
FROM analytics.prod_wh.fct_student_subgroup
JOIN analytics.prod_wh.dim_student
ON fct_student_subgroup.k_student = dim_student.k_student
JOIN analytics.prod_wh.dim_subgroup
ON fct_student_subgroup.k_subgroup = dim_subgroup.k_subgroup
WHERE fct_student_subgroup.school_year = 2025
LIMIT 1000;
BI-friendly structure
This long format structure allows BI tools to easily create filters, slicers, and visualizations where users can select demographic categories (e.g., "Race/Ethnicity", "Special Education Status") and values (e.g., "Hispanic or Latino", "Active") without needing to know all the column names in advance.
How to join demographics data to other student data?#
What you'll learn: How to combine demographics with other student data like attendance, grades, or assessments.
When joining demographics to other student data, consider which key to use (k_student vs. k_student_xyear) and whether you need annual or cross-year demographics.
-- Join demographics to attendance data for a specific school year
SELECT
msr_student_cumulative_attendance.school_year,
dim_student.safe_display_name,
dim_school.school_name,
dim_student.grade_level,
dim_student.race_ethnicity,
dim_student.is_english_language_learner_annual,
msr_student_cumulative_attendance.attendance_rate,
msr_student_cumulative_attendance.days_enrolled,
msr_student_cumulative_attendance.is_chronic_absentee
FROM analytics.prod_wh.msr_student_cumulative_attendance
JOIN analytics.prod_wh.dim_student
ON msr_student_cumulative_attendance.k_student = dim_student.k_student
JOIN analytics.prod_wh.dim_school
ON msr_student_cumulative_attendance.k_school = dim_school.k_school
WHERE msr_student_cumulative_attendance.school_year = 2025
AND dim_student.is_english_language_learner_annual = true;
Unique key of msr_student_cumulative_attendance
The msr_student_cumulative_attendance table has a unique key of student, school, and year. This means a student can have multiple rows per year if they attended multiple schools during that year. When joining to this table, be aware that your result set may contain multiple rows per student per year if they changed schools.
Using k_student_xyear for cross-year analysis
You may want to analyze historic attendance rates but filter to current-year demographics. For example, to look at historic attendance rates for students who are 8th graders in 2025:
-- Historic attendance rates for current-year 8th graders
SELECT
msr_student_cumulative_attendance.school_year,
dim_student_current.safe_display_name,
dim_student_current.grade_level as current_grade_level,
dim_student_historic.grade_level as historic_grade_level,
dim_school.school_name,
msr_student_cumulative_attendance.attendance_rate,
msr_student_cumulative_attendance.days_enrolled,
msr_student_cumulative_attendance.is_chronic_absentee
FROM analytics.prod_wh.msr_student_cumulative_attendance
JOIN analytics.prod_wh.dim_student as dim_student_historic
ON msr_student_cumulative_attendance.k_student = dim_student_historic.k_student
JOIN analytics.prod_wh.dim_student as dim_student_current
ON msr_student_cumulative_attendance.k_student_xyear = dim_student_current.k_student_xyear
AND dim_student_current.school_year = 2025
JOIN analytics.prod_wh.dim_school
ON msr_student_cumulative_attendance.k_school = dim_school.k_school
WHERE dim_student_current.grade_level = '8'
AND msr_student_cumulative_attendance.school_year <= 2025
ORDER by safe_display_name, school_year;
This query uses k_student_xyear to link historic attendance records to current-year demographics, allowing you to filter by current grade level while analyzing historical data.
Assumptions in this example
This example assumes 2025 is the current school year. This approach will only include historical data for students who are present in the 2025 school year. For an alternative approach that uses the most recent year a student was enrolled (rather than a fixed year), see the Assessment Query Guide's approach to joining demographics from the most recent school year.
How to analyze demographic changes over time?#
What you'll learn: How to analyze trends in outcomes (e.g., attendance, program participation) for specific demographic groups over time, such as "How has school A's attendance rate among multilingual learners changed over time?"
This use case combines annual demographics with time-series analysis to understand trends for demographic groups. Note that this example analyzes trends by school—each school's attendance rate for multilingual learners is tracked separately.
-- Analyze attendance rates for multilingual learners over time, by school
SELECT
ANY_VALUE(dim_school.school_name) as school_name,
dim_student.school_year,
dim_student.is_english_language_learner_annual,
COUNT(DISTINCT dim_student.k_student) as n_students,
ROUND(100*SUM(msr_student_cumulative_attendance.days_attended)/SUM(msr_student_cumulative_attendance.days_enrolled), 2) as avg_attendance_rate
FROM analytics.prod_wh.msr_student_cumulative_attendance
JOIN analytics.prod_wh.dim_student
ON msr_student_cumulative_attendance.k_student = dim_student.k_student
JOIN analytics.prod_wh.dim_school
ON msr_student_cumulative_attendance.k_school = dim_school.k_school
WHERE dim_student.school_year IN (2023,2024,2025)
AND dim_student.is_english_language_learner_annual = true
GROUP BY
dim_student.school_year,
dim_student.is_english_language_learner_annual,
dim_school.k_school
ORDER BY school_name, dim_student.school_year;
Longitudinal analysis
The annualized structure of dim_student makes it easy to track how demographic indicators change over time. You can analyze trends in program participation, demographic composition, and other characteristics across multiple school years.
Attendance rate calculation and student-school attribution
The rules for calculating average attendance rates and attributing students to schools should be reviewed carefully depending on your use case.
Attendance rate calculation: The query above calculates attendance rate by summing days attended and days enrolled across all students, then dividing. This gives equal weight to all days enrolled. Alternative approaches might include averaging individual student attendance rates (which gives equal weight to each student) or using weighted averages based on enrollment duration.
Student-school attribution: Since msr_student_cumulative_attendance has a unique key of student, school, and year, students who attended multiple schools in a year will have multiple rows. The query groups by school, which means students who transferred schools will be counted in multiple schools' aggregates. Depending on your analysis needs, you may want to:
- Use only the school where the student was enrolled longest
- Use only the student's primary school for the year
- Include all schools but adjust the calculation to avoid double-counting students
- Aggregate at a higher level (e.g., district) to avoid attribution issues
Note on ANY_VALUE: The query uses ANY_VALUE(dim_school.school_name) because school_name is not in the GROUP BY clause. Since we're grouping by k_school, all rows in each group will have the same school name, so ANY_VALUE safely returns that value. This is a common pattern when you need to include a descriptive field that's functionally dependent on the grouping key.
Why not include school_name in the GROUP BY? While including both k_school and school_name in the GROUP BY would be functionally correct (and safe), including just school_name without k_school would be problematic—two different schools could have the same name (e.g., two 'Lincoln Elementary' schools in different districts). Additionally, grouping by the surrogate key alone can be computationally more efficient and makes the code's intent clearer to readers: we're grouping by school identity, not by school name.
Haven't found what you're looking for?
Looking for additional use cases? Want similar guides for other domains? Contact us with your thoughts!