2. Table Structure
Module 2: Understanding the EDU Table Structure#
Now that you've seen how data is organized into databases and schemas, this section focuses on what you'll find inside the tables themselves. EDU follows consistent conventions for table structure and keys so that once you understand the patterns, you can apply them across the entire warehouse.
The goal of these conventions is predictability: when you open a new table, you should be able to quickly understand what each column represents and how it can be joined to other tables.
Column organization and naming#
EDU tables follow consistent conventions for how columns are organized and named. Understanding these patterns will help you find what you need quickly and write more reliable queries.
Column ordering#
Tables are organized with columns in a logical order, typically following these groupings, from left -> right:
- Primary data: The key information describing the table. This includes primary keys (in descending hierarchical order), followed by other relevant unique attributes, like names
- Foreign keys: Keys (natural or surrogate) meant to facilitate linking to other tables
- Logical data: Any remaining descriptive or factual elements, grouped in a sensible way (e.g., more salient elements first, groups of related elements together)
- Metadata: Data about the loading process, like timestamps
Here's an example showing how this organization works in practice. Notice how fct_student_daily_attendance follows this pattern: primary key first, then foreign keys, then the logical data (attendance information), and finally metadata:
See an example: fct_student_daily_attendance
| K_STUDENT | K_STUDENT_XYEAR | K_SCHOOL | K_CALENDAR_DATE | CALENDAR_DATE | K_SESSION | TENANT_CODE | ATTENDANCE_EVENT_CATEGORY | ATTENDANCE_EVENT_REASON | IS_ABSENT | IS_PRESENT | IS_ENROLLED | TOTAL_DAYS_ENROLLED | CUMULATIVE_DAYS_ABSENT | CUMULATIVE_DAYS_ATTENDED | CUMULATIVE_DAYS_ENROLLED | CUMULATIVE_ATTENDANCE_RATE | MEETS_ENROLLMENT_THRESHOLD | IS_CHRONIC_ABSENTEE | EVENT_DURATION | SCHOOL_ATTENDANCE_DURATION | ABSENTEE_CATEGORY_RANK | ABSENTEE_CATEGORY_LABEL |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0075f55bc3c4d2f5247763f1f8a81ca6 | 757d4ad7c97d511210559685a3fafcba | 4e9a120e6a61add76faf63ecf1505de4 | b27c4062b62daa2bd65546a0bb1612c3 | 2024-10-17 | a72a70007f0ccf21048f62deca055145 | SYNTHETIC | In Attendance | nan | 0 | 1 | 1 | 145 | 0 | 1 | 1 | 100 | False | False | nan | nan | nan | nan |
| 0075f55bc3c4d2f5247763f1f8a81ca6 | 757d4ad7c97d511210559685a3fafcba | 4e9a120e6a61add76faf63ecf1505de4 | f8a88e5540d70113d6060a4ee3539003 | 2024-10-18 | a72a70007f0ccf21048f62deca055145 | SYNTHETIC | In Attendance | nan | 0 | 1 | 1 | 145 | 0 | 2 | 2 | 100 | False | False | nan | nan | nan | nan |
| 0075f55bc3c4d2f5247763f1f8a81ca6 | 757d4ad7c97d511210559685a3fafcba | 4e9a120e6a61add76faf63ecf1505de4 | e9652b11648ed816d10f7540bd4fa5ed | 2024-10-21 | a72a70007f0ccf21048f62deca055145 | SYNTHETIC | In Attendance | nan | 0 | 1 | 1 | 145 | 0 | 3 | 3 | 100 | False | False | nan | nan | nan | nan |
Naming conventions#
EDU follows several naming conventions to make columns predictable and self-documenting:
- Boolean columns use the
is_prefix (e.g.,is_school_day,is_active,is_multilingual) - Descriptive names over short names: EDU prioritizes clarity over brevity. Column names are intentionally descriptive so you can understand what they represent without needing to look up documentation. For example,
student_unique_idis preferred overstu_id. - Snake case: All column names use snake_case (lowercase with underscores)
Surrogate keys (k_*)#
All EDU warehouse tables use surrogate keys, which are system-generated identifiers designed for reliable joins.
In practice, you'll recognize them because they start with k_, they're unique within a table, and they're what you should use for joins instead of "natural" identifiers. Each table has a primary surrogate key that uniquely identifies rows in that table. For example, k_student is the primary key in dim_student, and k_student_assessment is the primary key in fct_student_assessment.
Many tables also include foreign keys that reference the primary key of another table, typically of a dimension. These foreign keys use the same k_* naming convention, making joins predictable. For example, k_student in a fact table is a foreign key that joins to k_student (the primary key) in dim_student. Similarly, k_assessment in a fact table joins to dim_assessment, and k_school joins to dim_school.
Rule of thumb
If you see a column named k_something, there is usually a corresponding dim_something table (or a closely related dimension).
Cross-year keys (k_*_xyear)#
Some entities change over time in meaningful ways. To support both point-in-time and longitudinal analysis, EDU includes cross-year keys.
A common example is the difference between k_student and k_student_xyear. The first identifies a student within a specific school year, while the second identifies the same student across multiple years. As a rule of thumb, use k_student for within-year analysis and year-bound joins, and use k_student_xyear when you're tracking students over time and want a stable identifier across years.
You may be wondering why both exist, or why dim_student is unique on k_student (annualized). Annualizing dim_student was an intentional choice. It enables longitudinal analysis with point-in-time annual demographics. For example, you can easily find "What was student A's Multilingual status as of 2024-2025?" (to dive deeper, see this point-in-time query example from the Demographics Query Guide).
Tenant codes (tenant_code)#
Many EDU tables include tenant_code to identify the agency, district, or organization the data belongs to.
Tenant codes support multi-tenant deployments and enable consistent filtering and row-level security. Even in single-tenant deployments, the column is retained for consistency across environments.
One more concept: "grain"#
A table has a grain, or unique key, which describes what combination of columns make a row in the table unique.
For example, a table might have one row per student per day, one row per student per assessment, or one row per student per course per term.
The best place to find the grain of a table is its dbt documentation page. You'll learn more about navigating the dbt documentation site in Module 6.
Avoid accidental duplication
If you join tables at different grains (for example, joining a student-per-day fact to a student-per-assessment fact), you can accidentally multiply rows and inflate counts. Always be aware of the grain of the tables you're joining, and be aware of the resulting grain of your output.
Wrapping up#
You now know the core patterns that make EDU tables consistent. Surrogate keys (k_*) are the backbone of joining, cross-year keys (k_*_xyear) support longitudinal analysis, tenant codes help scope data in shared environments, and facts and dimensions follow predictable naming conventions.
Next, you'll learn how to connect to the data warehouse so you can start working with these tables.
« Data Warehouse Overview