Skip to content

How to deploy edu

Deploying edu is a process that involves provisioning cloud infrastructure and forking and customizing code. There are three major steps, each documented in more detail below.

  1. Fork and customize code repositories
  2. Prepare your AWS account
  3. Run CloudFormation templates and SQL scripts to deploy infrastructure and code
  4. Final customizations and first run

Fork and customize code repositories#

  1. fork the edu_project_template repo
  2. customize the airflow/configs/ and dbt/dbt_project.yml as needed
  3. run the CloudFormation and SQL scripts to set up AWS and Snowflake infrastructure

Prepare your AWS account#

Before we can run the CloudFormation tempaltes that will provision the required infrastructure, we must set up a few things manually in AWS. Log into the AWS console and

  1. Create a new VPC with two private and two public subnets (each in a different availbility zone). Note the VPC ID and the four subnet IDs.
  2. Delegate control of a subdomain (such as analytics.yourschooldistrict.org) to AWS. Various resources will be available under this subdomain (such as airflow.analytics.yourschooldistrict.org). In AWS Route53, create a Hosted Zone in AWS for the delegated subdomain, and note its ID.
  3. In AWS Certificate Manager, provision a wildcard certificate for your delegated subdomain. (This will be used to ensure traffic in and out of the system is encrypted.) Once provisioned, note the SSL certificate's ARN.
  4. In AWS EC2, create a key-pair and note its name. (This will be needed in order to SSH into the airflow instance(s).)
  5. In AWS S3, create two buckets:
    • a bucket to be used as the data lake, where raw JSON data extracted from your Ed-Fi API(s) will be stored, such as yourschooldistrict-edu-analytics-datalake
    • a bucket where the CloudFormation Templates used to set up AWS infrastructure will be stored, such as yourschooldistrict-edu-analytics-cloudformation
  6. Finally, upload the contents of the edfi_airflow_cloudformation repository to your CloudFormation template S3 bucket

With these preparation steps complete, we can now run CloudFormation and SQL scripts to deploy the required infrastructure.

Run CloudFormation templates and SQL scripts to deploy infrastructure and code#

  1. Deploy the CloudFormation templates, specifying the required parameters, such as the VPC ID, subnet IDs, Hosted Zone ID, domain name, ACM SSL certificate ARN, EC2 key-pair name, and S3 bucket names that were created under Prepare your AWS account. You'll also have to
    • give your system an environment name
    • choose an EC2 instance size for Airflow (large is recommended)
    • specify the name of the repository you forked from int_edfi_project_template (in Fork and customize code repositories)
    • specify a password to be used for the RDS Postgres instance (which is Airflow's storage backend)
  2. Run the SQL scripts to set up your Snowflake account with the warehouses, databases, roles, and other objects required for EDU code to function.

Final customizations and first run#

  1. Log into Airflow and add connection(s) for your Ed-Fi API(s) and (optionally) Slack reporting
  2. Run the Ed-Fi DAGs in Airflow, then run the dbt DAG in Airflow
  3. Log into Snowflake and verify there's a prod_wh schema with fact and dimension tables containing data