The UDP Event store serves as the archive for all behavioral data captured for all time. The UDP Event data pipeline streams enriched events into the UDP Event store in real-time.
Implementation in Google BigQuery
The UDP Event store is implemented in Google BigQuery using a date-partitioned table, called expanded. The expanded table is located in the event_store dataset.
The UDP Event store in Google BigQuery.
The expanded table's date partitioning is done on the Caliper event's event_time variable. The event_time variable corresponds to the event timestamp provided in Caliper event payloads themselves. Consequently, the Event store partitions events based on the timestamp that learning tools report when a behavior occurred (rather than, say, when the event was written to the Event store). When writing queries on the expanded table, having the event_time in the WHERE clause is required.
Unizin enforces a 7TB daily byte scan limit in BigQuery for all consortium members. BigQuery charges users very lightly for data storage, which is ideal for storing the multiple TB-sized expanded table; however, BigQuery charges more heavily for data computation and usage. The current rate for BigQuery byte scan is $5 per 1TB scanned in a query. Enforcing a 7TB daily limit per school allows BigQuery to remain a powerful, useful tool while staying within our financial requirements as a consortium. The expanded table is so large that we require this partition filter in queries to prevent large scans of terabytes of data by accident.
Here is an example framework for a query:
Query Framework Example - Expanded Table
WHERE event_time >='2022-06-01'and event_time <='2022-06-10'-- This will pull all events from 6/1 - 6/10
<FURTHER AND/OR CONDITIONS TO FILTER RESULTS>
The UDP installation process will automatically create the event_store BigQuery dataset and expanded table. The UDP will configure the expanded table to be partitioned by event_time. The UDP will also automatically create the service accounts needed by the UDP Event enricher to stream insert event data into the expanded table.