2. Table Structure

Module 2: Understanding the EDU Table Structure#

Now that you've seen how data is organized into databases and schemas, this section focuses on what you'll find inside the tables themselves. EDU follows consistent conventions for table structure and keys so that once you understand the patterns, you can apply them across the entire warehouse.

The goal of these conventions is predictability: when you open a new table, you should be able to quickly understand what each column represents and how it can be joined to other tables.

Column organization and naming#

EDU tables follow consistent conventions for how columns are organized and named. Understanding these patterns will help you find what you need quickly and write more reliable queries.

Column ordering#

Tables are organized with columns in a logical order, typically following these groupings, from left -> right:

Primary data: The key information describing the table. This includes primary keys (in descending hierarchical order), followed by other relevant unique attributes, like names
Foreign keys: Keys (natural or surrogate) meant to facilitate linking to other tables
Logical data: Any remaining descriptive or factual elements, grouped in a sensible way (e.g., more salient elements first, groups of related elements together)
Metadata: Data about the loading process, like timestamps

Here's an example showing how this organization works in practice. Notice how fct_student_daily_attendance follows this pattern: primary key first, then foreign keys, then the logical data (attendance information), and finally metadata:

See an example: fct_student_daily_attendance

K_STUDENT	K_STUDENT_XYEAR	K_SCHOOL	K_CALENDAR_DATE	CALENDAR_DATE	K_SESSION	TENANT_CODE	ATTENDANCE_EVENT_CATEGORY	ATTENDANCE_EVENT_REASON	IS_PRESENT	IS_ENROLLED	TOTAL_DAYS_ENROLLED	CUMULATIVE_DAYS_ATTENDED	CUMULATIVE_DAYS_ENROLLED	CUMULATIVE_ATTENDANCE_RATE	MEETS_ENROLLMENT_THRESHOLD	IS_CHRONIC_ABSENTEE	EVENT_DURATION	SCHOOL_ATTENDANCE_DURATION	ABSENTEE_CATEGORY_RANK	ABSENTEE_CATEGORY_LABEL
0075f55bc3c4d2f5247763f1f8a81ca6	757d4ad7c97d511210559685a3fafcba	4e9a120e6a61add76faf63ecf1505de4	b27c4062b62daa2bd65546a0bb1612c3	2024-10-17	a72a70007f0ccf21048f62deca055145	SYNTHETIC	In Attendance	nan	1	1	145	1	1	100	False	False	nan	nan	nan	nan
0075f55bc3c4d2f5247763f1f8a81ca6	757d4ad7c97d511210559685a3fafcba	4e9a120e6a61add76faf63ecf1505de4	f8a88e5540d70113d6060a4ee3539003	2024-10-18	a72a70007f0ccf21048f62deca055145	SYNTHETIC	In Attendance	nan	1	1	145	2	2	100	False	False	nan	nan	nan	nan
0075f55bc3c4d2f5247763f1f8a81ca6	757d4ad7c97d511210559685a3fafcba	4e9a120e6a61add76faf63ecf1505de4	e9652b11648ed816d10f7540bd4fa5ed	2024-10-21	a72a70007f0ccf21048f62deca055145	SYNTHETIC	In Attendance	nan	1	1	145	3	3	100	False	False	nan	nan	nan	nan

Naming conventions#

EDU follows several naming conventions to make columns predictable and self-documenting:

Boolean columns use the is_ prefix (e.g., is_school_day, is_active, is_multilingual)
Descriptive names over short names: EDU prioritizes clarity over brevity. Column names are intentionally descriptive so you can understand what they represent without needing to look up documentation. For example, student_unique_id is preferred over stu_id.
Snake case: All column names use snake_case (lowercase with underscores)

Surrogate keys (`k_*`)#

All EDU warehouse tables use surrogate keys, which are system-generated identifiers designed for reliable joins.

In practice, you'll recognize them because they start with k_, they're unique within a table, and they're what you should use for joins instead of "natural" identifiers. Each table has a primary surrogate key that uniquely identifies rows in that table. For example, k_student is the primary key in dim_student, and k_student_assessment is the primary key in fct_student_assessment.

Many tables also include foreign keys that reference the primary key of another table, typically of a dimension. These foreign keys use the same k_* naming convention, making joins predictable. For example, k_student in a fact table is a foreign key that joins to k_student (the primary key) in dim_student. Similarly, k_assessment in a fact table joins to dim_assessment, and k_school joins to dim_school.

Rule of thumb

If you see a column named k_something, there is usually a corresponding dim_something table (or a closely related dimension).

Cross-year keys (`k_*_xyear`)#

Some entities change over time in meaningful ways. To support both point-in-time and longitudinal analysis, EDU includes cross-year keys.

A common example is the difference between k_student and k_student_xyear. The first identifies a student within a specific school year, while the second identifies the same student across multiple years. As a rule of thumb, use k_student for within-year analysis and year-bound joins, and use k_student_xyear when you're tracking students over time and want a stable identifier across years.

You may be wondering why both exist, or why dim_student is unique on k_student (annualized). Annualizing dim_student was an intentional choice. It enables longitudinal analysis with point-in-time annual demographics. For example, you can easily find "What was student A's Multilingual status as of 2024-2025?" (to dive deeper, see this point-in-time query example from the Demographics Query Guide).

Tenant codes (`tenant_code`)#

Many EDU tables include tenant_code to identify the agency, district, or organization the data belongs to.

Tenant codes support multi-tenant deployments and enable consistent filtering and row-level security. Even in single-tenant deployments, the column is retained for consistency across environments.

One more concept: "grain"#

A table has a grain, or unique key, which describes what combination of columns make a row in the table unique.

For example, a table might have one row per student per day, one row per student per assessment, or one row per student per course per term.

The best place to find the grain of a table is its dbt documentation page. You'll learn more about navigating the dbt documentation site in Module 6.

Avoid accidental duplication

If you join tables at different grains (for example, joining a student-per-day fact to a student-per-assessment fact), you can accidentally multiply rows and inflate counts. Always be aware of the grain of the tables you're joining, and be aware of the resulting grain of your output.

Wrapping up#

You now know the core patterns that make EDU tables consistent. Surrogate keys (k_*) are the backbone of joining, cross-year keys (k_*_xyear) support longitudinal analysis, tenant codes help scope data in shared environments, and facts and dimensions follow predictable naming conventions.

Next, you'll learn how to connect to the data warehouse so you can start working with these tables.

Previous
« Data Warehouse Overview

Next
Authenticating & Connecting »