Skip to content

Connect to EDU Data Warehouse using R#

The goal of this document is to give a step by step guide to connecting to an EDU data warehouse using R. This document assumes that the user has an account with read permissions on the data warehouse, and has R installed and set up.

Install packages#

This demonstration will use a few R packages to help connect to the database, so install these R packages in your R console, or verify that they are installed on the version of R you are running.

install.packages("DBI");
install.packages("obdc");

Connect to the Snowflake database#

Create a connection object in R using dbConnect & odbc:

  connection_obj <- odbc::dbConnect(odbc::odbc(),
    # keep this string for the Snowflake driver
    Driver = 'SnowflakeDSIIDriver',
    # add Snowflake host URL (don't add https://)
    Server ='YOUR_ACCOUNT.snowflakecomputing.com',
    # plug in your username and password
    uid='your_username',
    pwd='your_password',
    # default port 
    port='443',
    #  EDU data by default will be in a database called "analytics"
    database='analytics')

If you check this code into version control, or share it, make sure to remove your password. We recommend encrypyting the password and passing it in without ever including it in the code.

We can also supply additional arguments to two Snowflake parameters. Snowflake will use the defaults if none are supplied, and most users will only have one. If you have multiple roles, we strongly recommend using a read-only role for connection to the database. The arguments that can be supplied are:

warehouse

role

Pull in data#

If you are using RStudio as an IDE, once the database is connected to the R Session there will be a pane in the IDE to explore the schema, tables, and columns available. You can query the database and bring data into R as a data.frame using _dbGetQuery _and passing in the connection object you've created in the previous step. You can also request access to the documentation site for the data warehouse to better understand the tables and models that are available.

  # query the database to get the aggregated number of schools
  output_data_query <- DBI::dbGetQuery(connection_obj,
    "
    select count(*)
    from prod_wh.dim_school
    "))