In the final phase of the UDP's Context data pipeline, the UDP Context store is updated to present newly imported context data. The UDP Context store serves as an aggregate, consolidated data store for all context data in a UDP instance. 

The context store database is the database resource to which institutions will connect to use context data, including the keymap. 

The purpose of the context_store database is to present the keymap and entity data that is the result of the context data ETL process. The context_store database must also be updated at regular, periodic intervals (usually once daily) without interrupting current connections to the database. Both of these tasks are accomplished through a small set of schemas.

The context store database has a relatively simple design to reflect its single function: to present the latest version of ingested context data to institutions.

However, there is some nuance in the design to enable the zero-downtime replication process that is part of the “publish” phase of the ETL.

Schemas

The two primary database schemas (namespaces) of the context store are entity and keymap. Each schema is used to present the two essential parts of the UDP Context store:

  1. The consolidated, normalized, entity-by-entity context data in a relational schema, which is housed in the entity schema.
  2. The entity-by-entity keymaps that unify source data identifiers with UDP identifiers, which are housed in the keymap schema.

In the sections below, we briefly describe the organization of views and data in these schemas. For a fuller description of how to get started using the UDP Context data, see our primer on using the UDP Context store.

Entity schema

The entity schema of the "context_store" database contains all context data from all data sources.

The schema is composed of views (not tables), where each view corresponds to a distinct entity of the Unizin Common Data Model. For example, the "person" view corresponds to the Person entity; the "academic_organization" view corresponds to the Academic organization entity; and, the "person__academic_degree" view corresponds to the Person-Academic degree entity.

Singular UCDM entities

For views in the entity schema that corresponds to a singular entity, a primary key (which corresponds to a UDP ID key in the keymap) is provided whose name corresponds to the entity. For example, the primary key for a Person record in the "person" view is "person_id;" the primary key for an Academic organization record in the "academic_organization" view is "academic_organization_id."

A singular entity has a primary key.

A brief description of the Person table definition, highlighting that singular Entities are defined by a primary key.

Composite UCDM entities

Views in the entity schema that correspond to a composite entity do not have primary keys. For example, Person-Academic degree records in the "person__academic_degree" table do not have a primary key. Instead, each record has both a "person_id" and an "academic_degree_id," which jointly define a unique record.

Composite entities are defined by several keys to singular entities.

A brief description of the Person-Academic degree table definition, highlighting that composite Entities are defined by two or more UDP IDs of distinct entities.

Keymap schema

The keymap schema of the "context_store" database contains all of the entity keymaps in a UDP instance.

The schema is composed of a set of views (not tables), where each view corresponds to a distinct, singular UCDM entity. For example, the "person" view corresponds to the Person entity. In contrast to the entity schema, where every UCDM Entity is represented, the keymap schema only contains views for singular UCDM entities. Composite UCDM entities do not have keymap tables, since no UDP IDs are ever required to define them. Instead, composite UCDM entities are composed of the UDP IDs of their composite entities.

All keymap views have the same definition. The first column in the view corresponds to primary key for a singular UCDM entity record. Every subsequent column corresponds to the native identifies associated with the UDP from a native data system. In the diagram below, for example, we see a keymap view where native identifies from the SIS, LMS, and a learning tool are associated with a particular primary key. As context data from different systems is added to a UDP instance, the keymap tables widen to accommodate more native keys from each source data system.

An example keymap table definition, highlighting that a surrogate identifier (i.e., a UDP ID) maintains relationships with native identifiers from source data systems such as the SIS and LMS

For a comprehensive overview of the UDP Keymap concept, please review our keymap documentation.

Zero downtime updates (via "publisher" schemas)

One of the features of the UDP batch-ingest process is the zero-downtime update of the UDP Context store.

Zero downtime is achieved by maintaining two sets of context stores behind the scenes, one of which serves the views of the entity and keymap schemas, the other that is updated during the next run of the batch-ingest DAG. When an import process is complete, the views in the entity and keymap schemas are pointed to the newly-updated tables.

The swap between two sets of the context store is achieved through what's called the "publisher" schemas of the UDP Context store. These schemas also contain tables that are definitionally equivalent to the entity and keymap schemas of the "context_store" database. However, for each entity, two tables are defined in the publisher_keymap and publisher_entity schemas. They are the A and B tables. For example, the Person entity tables in the “publisher_entity” schema are called “person__A” and “person__B.”  Note that in both of these cases, the “person” and the identifier are separated by double-underscores. This is part of the naming convention.

The UDP keeps track of which publisher tables –A or B– are associated with a corresponding entity or keymap view. In the publish step of the context data ETL, the UDP will replicate the newly ingested data into the publisher table that is not currently used by the entity and keymap schemas. Once the replication of an entity’s data in the publisher tables is complete, the UDP will redefine the corresponding entity view in either the keymap or entity schema to point to the table – A or B – with the freshest data.

  • No labels