The UDP accepts context data via Google Cloud Storage buckets that exclusively serve as data ingress points for a UDP instance. It is expected that institutions and vendors who support a UDP data integration will upload data to the appropriate buckets every day. Broadly speaking, the following rules exist for pushing context data into the UDP:

  1. Context datasets must conform with a Loading schemas
  2. Distinct buckets must be used for each context dataset
  3. Within a bucket, datasets are organized by data-specific folders
  4. A manifest file must accompany each context dataset

For more information on the data integration mechanics, please visit our Context data integration documentation.

The ingress point for all context data is a Cloud Storage bucket. Each system that produces context data will be assigned a unique bucket in Cloud Storage to which it pushes data every night (or similar periodic interval). Within any particular bucket, date-specific folders are used to isolate distinct context datasets.

Context data ingress buckets

The UDP uses a variety of Cloud Storage buckets for different purposes. Only a subset of them is used as ingress points for context datasets.

In the screenshot below, we see seven Cloud Storage buckets that were generated by the UDP during the installation process. In this example, the shortcodes "edu" and "prod" are used to represent that this UDP instance exists for an Academic institution with a shortcode of "edu" and that is running in the production ("prod") environment.

This UDP instance is receiving context data from four data sources:

  1. The institution's SIS (via the "sis-unizin-udp-data-edu-prod-daily" bucket)
  2. The Canvas LMS (via the "canvas-unizin-udp-data-edu-prod-daily" bucket)
  3. TopHat (via the "udp-edu-prod-tophat" bucket)
  4. TurnItIn (via the "tii-udp-edu-prod-daily" bucket)

Note that there are two naming conventions for these buckets. A legacy naming convention is used for the SIS, LMS, and TurnItIn data. By contrast, a modern, simpler naming convention is used for the TopHat context data. As of the writing of this documentation, the modern naming convention will be used for all vendors who support data integrations with the UDP.

All other buckets in the screenshot above are unrelated to context data ingress into a UDP instance.

Folder structure

Within a bucket used to ingress context data, a folder structure must be observed and respected. Context data is expected to be pushed up to the UDP every night into date-specific folders corresponding to the day on which the dataset was generated (not the date that may correspond to the freshness of the data).

In the screenshot below, we see folders that each represent a daily SIS context dataset pushed into a UDP instance. These folders are located in the bucket that is exclusively reserved as the ingress point for SIS context data.

The UDP does not automatically create date-specific folders across the context data ingress buckets. It is the responsibility of the entity pushing data into the UDP to also create the date folders. For example, an academic institution that pushes its SIS data to the UDP every night must also create the date folders in which their SIS data is placed.

Context dataset

The CSV files and manifest file corresponding to a single dataset are all stored inside a single folder of a context data ingress bucket.

As described in our context data integration documentation, the UDP requires that distinct CSV files exist for each entity of the UDP loading schema. The CSV filenames must contain the entity whose data is contained in the file and a date. The date must correspond to the date for the dataset (which should be identical to the date folder).

Note that the manifest file follows its own naming convention. In the example below, the "sis" prefix in "sis_daily_2020-12-28.done" identifies that the dataset corresponds to the SIS loading schema.

  • No labels