This page will help a new user understand what edu is, why someone might want to use it, and what steps and skills are required to do so.
What is Enable Data Union?#
enable data union is an open framework that extracts data out of Ed-Fi, loads the data into a data warehouse, and transforms the data into a data model designed for analytics. The code has been developed by Education Analytics, a non-profit based in Madison, WI. Intended users of edu include Analysts, BI developers, education researchers, and other technical staff at state and local education agencies, who can use the data warehouse to build reports and analyze education data.
The intent of the code is to be:
- Fully functional right out of the box so that it can be used by any education agency that has implemented Ed-Fi, with defaults that make sense for many cases
- Configurable so that common code can be leveraged across many agencies while accounting for local differences, and those configurations can be stored in sensible, contained locations to avoid needing to fork the code
- Extensible so that it can serve as a framework for broader uses of education agencies data warehousing needs by integrating additional data or adding more metrics
- Transparent so you can see how data is being tranformed at every step and everything is controlled by inspectable code
- Secure so that while the logic and code are transparent, the data itself is secure
It is implemented as a codebase that is free and open for education agencies to use with a modular approach to configure, customize, and extend the base functionality. The code is available under the Polyform Non-commercial license.
The concepts and abstractions in the codebase allow for it to be implemented in many different technology stacks, but for now implementing the published code requires an AWS environment and a Snowflake database. We intend to add support for more cloud providers and databases over time.
Why would you use Enable Data Union?#
You have data in an Ed-Fi ODS and want to use it for analytics
You want to modernize your agency's analytics infrastructure and move away from custom, on-premise data warehouses and take advantage of technological innovations in data engineering
You want to build a data warehouse based on Ed-fi but do not want to have to start from scratch when you could share development and maintenance costs with other agencies across the country
You do not want to be tied to a vendor company's proprietary product for analytics
What is Enable Data Union, really?#
The Enable Data Union code is a collection of repositories on Github that, when deployed, produce a data pipeline and Snowflake data warehouse based on Ed-Fi. When you set it set up, you will have a database populated with your Ed-Fi data, organized for analytic queries. You will need an AWS account, a Snowflake account, and credentials to read data from an Ed-Fi API. Once it has been successfully set up, you will have:
- An AWS environment running Apache Airflow with pre-populated DAGs that:
- Pull data from Ed-Fi API(s) on a configurable schedule
- Trigger dbt (data build tool) runs in the Snowflake database on a configurable schedule
- An S3 bucket to stage the raw data to load into Snowflake
- A Snowflake database with a queriable dimensional data model
- An implementation Github repository that contains all of the configuration and customization
Most users of edu technology will not need to concern themselves with the behind-the-scenes details of the technology-- from a user's perspective, edu will be a database that they can interact with by writing SQL queries.
We describe edu as a "framework." To be precise, we try to consistently refer to names of the components of this framework as described in the following diagram.
How do you use it ?#
This section describes at a high level what you need to do to set up and use edu yourself, along with the technical skills needed. If you are interested in more hands-on support for implementing this code, another option is to consider Stadium, EA's hosted product offering of edu , which provides infrastructure & security management, support, documentation, and development.
We have organized the documentation mostly around the following three categories to help users find what they need.
Setting up the infrastructure#
IT staff, DevOps professionals, or Cloud Engineers
Setup currently requires familiarity with cloud services in AWS & Snowflake. You need to have an AWS account and a Snowflake account. Setup then involves using CloudFormation to set up the AWS infrastructure and running SQL queries to set up the Snowflake infrastructure. We also cover general security of the cloud environments in this section.
Configuring, customizing, managing the system, and extending the functionality#
Analytics Engineers and Data Engineers
edu is designed to require minimal configuration in the setup but give implementers to the tools to extend it if desired. Extending the data coverage integrations requires development in Python and Airflow. Extending the data model and transformations of new data requires using dbt.
Using the data warehouse once it is up and running#
Analysts, Business Intelligence Developers, Researchers, Data Scientists
Interacting with the data in the data warehouse directly requires writing SQL queries. You can also connect business intelligence tools to the data warehouse.