Event store
The UDP Event store serves as the archive for all behavioral data captured for all time. The UDP Event data pipeline streams enriched events into the UDP Event store in real-time.
Implementation in Google BigQuery
The UDP Event store is implemented in Google BigQuery using a date-partitioned table, called expanded
. The expanded
table is located in the event_store
dataset.
The expanded
table's date partitioning is done on the Caliper event's event_time
variable. The event_time
variable corresponds to the event timestamp provided in Caliper event payloads themselves. Consequently, the Event store partitions events based on the timestamp that learning tools report when a behavior occurred (rather than, say, when the event was written to the Event store). When writing queries on the expanded
table, having the event_time
in the WHERE clause is required.
Unizin enforces a 7TB daily byte scan limit in BigQuery for all consortium members. BigQuery charges users very lightly for data storage, which is ideal for storing the multiple TB-sized expanded table; however, BigQuery charges more heavily for data computation and usage. The current rate for BigQuery byte scan is $5 per 1TB scanned in a query. Enforcing a 7TB daily limit per school allows BigQuery to remain a powerful, useful tool while staying within our financial requirements as a consortium. The expanded
table is so large that we require this partition filter in queries to prevent large scans of terabytes of data by accident.
Here is an example framework for a query:
Query Framework Example - Expanded Table
The UDP installation process will automatically create the event_store
BigQuery dataset and expanded
table. The UDP will configure the expanded
table to be partitioned by event_time
. The UDP will also automatically create the service accounts needed by the UDP Event enricher to stream insert event data into the expanded
table.
For a full explanation of the Event store data schema, please see our dedicated docs.
Last updated