Unizin Product Documentation
ProductsSupport and TrainingPolicies
  • Unizin Product Documentation
  • Products
    • Content
      • Unizin Engage
        • eReader User Guide
          • Notes, Highlights, and Citations
          • Appearance Settings
          • Download for Offline
          • eReader Layout
          • Keyboard Shortcuts
          • Navigating Your eBook
          • Print
          • Text to Speech
          • Copy and Paste
          • Creating Flashcards
          • Collaboration and Note Sharing
          • Pearson Titles
        • Institution Support
          • Disabled Student Services / Alt-Format
            • Best Practices for Republishing Course Content
            • Disabled Student Services
            • Requesting eTextbook Files for Accessibility Purposes
            • WCAG 2.0 AA evaluation for Engage
            • WCAG 2.0 AA evaluation for EPUB for Engage
          • Institution's Support Responsibilities
        • Caliper 1.1 sensor
        • Release Notes
          • 2.28.22
          • 2019-09-17
          • 2019-05-29
          • 2.26.8
          • 2.26.0
          • 2.25.0
          • 2.22.0
          • 2.21.6
          • 2.21.5
          • 2.20.8
          • 2.20.5
          • 2.20.3
          • 2.19.1
          • 2.18.0
          • 2.17.0
          • 2.14.0
          • 2.12.0
          • 2.11.0
          • 2.9.0
          • 2.8.3
          • 2016-03-17
          • 2016-02-11
          • 2016-01-28
        • Using Analytics (New)
      • Unizin Order Tool
        • Overview of the User Interface
        • Key Concepts
          • Profiles
          • Ordering periods
          • Coordinator permissions
          • Program administrator permissions
        • Courses & Ordering
          • Course filtering
          • Place an order
          • Add sections to a placed order
          • Edit a placed order
          • Cancel an order
          • Reordering
        • Order History
          • Instructor Order History
          • Coordinator and Program Administrator Order History
          • Order Activity
        • Student Choice
          • Student Choice (Program Administrators)
          • Student Choice (Students)
        • Entitlements
          • Entitlements (Program Administrators)
          • Entitlements (Students)
        • Catalog Tool
        • Schedule of Classes
        • Content Request Tool
        • Order Tool Dashboard
        • Vendor Sandbox Tenant
        • Institution Support
        • Implementation
          • SIS Data Integration
            • 1.0 - SIS Integration
            • 2.0 - SIS Integration
          • SSO integration
          • UI customizations
          • Order Feed
            • 1.0 - Order Feed
            • 2.0 - Order Feed
            • 3.0 - Order Feed
          • Publisher report
          • Final declined offers feed
          • Institutional (SIS) Catalog Import
          • Student Price
          • Historical Entitlements Import
        • Release Notes
          • Order Tool Bug Fixes and Enhancements
          • Order Tool Bug Fixes
          • Order Tool Accessibility Improvements
          • Order Feed Improvements
          • Content Request Form Update and Minor Bug Fix
          • Flat Markup Fee Update
          • Ordering Email Receipt Update & Minor Bug Fix
          • Bug Fix for Public Catalog Feature
          • Catalog Search Enhancements
          • Reordering Reminder Email Notifications
          • UX Improvements & Minor Bug Fixes
          • Historical Entitlements Import
          • Student Prices
          • Reordering Feature
          • Email Enhancements
          • Ordering Enhancements
          • Bug Fix for the Institutional Catalog Import
          • Bug Fix for the Final Declined Offers Feed (FDOF)
          • Order Activity Feature and Other Enhancements
          • Bug Fixes for Order History and Report an Issue Features
          • Public catalog feature
          • Minor Bug Fixes for Ordering and Student Choice
          • Entitlements Production Release, Bug fixes, and Minor updates
          • Minor Updates and Bug Fixes for Ordering Workflows
          • Catalog Search Optimization
          • Student Choice
          • Archive Terms Feature and Integration Improvements
          • Introduces the Program Administrator role, Catalog Tool, and Schedule
          • User interface updates and improvements
          • Order feed improvements
          • Order history, UI enhancements
          • Email notification upgrades, UI improvements
          • Order feed changes
          • New features for Course coordinators and upgrades to the UI
          • Changes to the Term, Course, and Section models; introduces a Session
          • Bug fixes, import improvements, and validation improvements
          • Tracking Order History
          • Publisher Reporting
          • Fixes the order feed, automates SIS data importing, and automates the generation of order feed repor
    • Data & Analytics
      • Unizin Data Platform
        • Key concepts
          • Platform overview
          • Data categories
          • Data models
          • Loading schemas
          • Keymap
        • Unizin Common Data Model
          • Academic structures (ERD)
          • Learners (ERD)
          • Course structures (ERD)
          • Course resources (ERD)
          • Learner activities (ERD)
          • Quizzes (ERD)
          • Social (ERD)
          • Course outcomes (ERD)
        • System overview
          • Context data pipeline
            • Context data ingress
            • Batch-ingest application
            • Batch-ingest db server
            • Context store
          • Event data pipeline
            • UDP Caliper endpoint
            • Approval process for implementing Caliper compliant tools
            • UDP Event enricher
            • Event store
        • Data stores
          • Data lake
            • UDP Context store
            • UDP Event store
              • Accessing the Event store
              • Expanded table
                • Expanded table: Canvas edApp mapping
            • Synthetic Data [beta]
              • Viewing Synthetic Data datasets within the BigQuery UI
              • Query Synthetic Data via client libraries
          • Data marts
            • UDP Distributions
            • Interaction sessions
            • Learning Environment Organization
            • File Interaction
            • Last Activity
            • Long Inactivity
            • Course Status
            • Daily Course Grade Record
            • LTI Tool Use
            • LMS Tool Use
            • Tool Usage Metrics
            • Links
            • Taskforce
              • Level 1 Aggregated
              • Level 2 Aggregated
              • Level 2 Course Weekly Distribution Summary
              • Student Term Profile
              • Course Profile
            • Student Activity Score
              • Student Course Metrics
              • Student Course Section Metrics
              • Final
              • Course Final
              • Course Section Final
        • Data integrations
          • Context data integration
            • Loading schema
            • Keymap support
            • Manifest file
            • File requirements
            • Integration mechanics
          • Event data integration
          • SIS data integration
          • LMS data integration
            • Instructure Canvas
        • Release Notes
          • UDP Marts Release Notes
            • 1.0.83
            • 1.0.80
            • 1.0.79
            • 1.0.78
            • 1.0.77
            • 1.0.72
            • 1.0.67
            • 1.0.58
            • 1.0.51
            • 1.0.44
            • 1.0.42
            • 1.0.32
            • 1.0.31
            • 1.0.0
            • Level 2 Taskforce data marts now available
          • 2.0.167
          • 2.0.152
          • 2.0.138
          • 2.0.137
          • 2.0.113
          • 2.0.112
          • 2.0.111
          • 2.0.110
          • 2.0.99
          • 2.0.98
          • 2.0.83
          • 2.0.80
          • 2.0.71
          • 2.0.66
          • 2.0.59
          • 2.0.58
          • 2.0.53
          • 2.0.47
          • 2.0.25
        • Miscellaneous
          • Canvas Data additions, ~Fall 2021
          • Canvas Live Events: from SQS to HTTPS
          • Canvas New Analytics vs. UDP
          • Course Section Enrollment Role Status Mappings
          • Migrating from UDW to UDP
      • Unizin Data Warehouse
        • Implementation Guide
        • Scope of Services
        • Access Provisioning
        • Access Revocation
        • Connecting to the UDW
      • Raw Canvas Data 2
        • Flat Files
        • BigQuery Datasets
    • Hosted Services
      • My Learning Analytics
        • Install MyLA via LTI 1.3
        • Custom configure MyLA
  • Support and Training
    • Professional Development
      • Stepping Stones: A Faculty Development Curriculum for Learning Analytics Use
      • Structured Conversations initiative
    • UDP Self-paced Training
    • Resources Site Broken Links
    • Status Pages
  • Policies
    • General policies
      • Sponsor Teams
      • Browser Support Policy
      • Opt-Out & Invoicing Policy (Order Tool)
    • Support Policy
      • Unizin Engage - SP
      • Unizin Order Tool - SP
      • Unizin Data Platform - SP
      • Unizin Data Warehouse - SP
      • Unizin Data Analysis - SP
      • Pressbooks Hosting - SP
    • Privacy Policy
      • Unizin Engage - PP
      • Unizin Order Tool - PP
      • Unizin Data Platform - PP
      • RStudio service - PP
    • End User License Agreements
      • Unizin Engage - EULA
      • Unizin Order Tool - EULA
    • Terms of Use
      • Unizin Data Platform - ToU
    • Incident Reports
Powered by GitBook
LogoLogo

Unizin Homepage

  • unizin.org

Data & Analytics

  • Unizin Data Platform
  • Unizin Data Warehouse

Content

  • Unizin Engage
  • Unizin Order Tool

Hosted Services

  • My Learning Analytics

Copyright © 2023, Unizin, Ltd.

On this page
  • Overview
  • Motivation for Change
  • Refresh Frequency
  • BigQuery Datasets
  • mart_helper Deep Dive
  • Code Repository and Visualized Dependencies
  • Interactive Dependency Charts
  • Questions and Requests For Changes/Additions
  1. Products
  2. Data & Analytics
  3. Unizin Data Platform
  4. Data stores

Data marts

PreviousQuery Synthetic Data via client librariesNextUDP Distributions

Last updated 4 months ago

Overview

Unizin has developed new data mart construction and delivery infrastructure that aligns more closely with the rest of the batch ingest process. Previously, we built the marts in a Unizin internal GCP project based on SQL queries scheduled directly in BigQuery for refresh. The marts were then “passed through” into each tenant’s prod GCP project in mart-prefixed datasets containing materialized views. Image 1 below shows the previous delivery of the marts:

Motivation for Change

Unizin’s data mart development began in 2021 as “proofs of concept” assets aiding use-case-driven analytics essential and shared across the consortium. Organically, the data marts have elevated into a first-party Unizin service core to the UDP and membership value.

The data marts have reached a point where convergence with the rest of Unizin’s cloud-based infrastructure and processes will enable more performant, scalable, and reliable data mart development moving forward.

Building the marts requires a series of dependencies and intermediary queries that we previously kept “behind the scenes” in favor of cleanliness in the prod GCP projects. However, Unizin has received feedback that seeing the “helper tables” has value on their own as well! The new infrastructure for building the marts will expose all dependencies and intermediary queries alongside the final mart presentations.

Refresh Frequency

The refresh frequency of the marts and helper tables depends mainly on the source UDP data that fuels the tables in scope.

  • Dependency on context_store data - refresh frequency is daily; the current context_store data update daily, so there is no need to refresh these tables more often than daily.

  • Dependency on event_store data - refresh frequency is hourly (usually); the event_store data refresh almost realtime. We approximate “realtime” with hourly refreshes to strike a balance with refresh scanning costs in BigQuery. Nearing realtime requires more frequent scans of data which becomes more costly. Hourly refreshes is our current optimization of data freshness against refresh costs.

  • Mixed dependencies - these tables often refresh on the most restrictive cadence. If a dependency refreshes hourly, these tables will refresh hourly too.

  • UDP Distributions - these tables refresh on a monthly cadence.

Each table in BigQuery has a Details tab that can be consulted for questions about refresh frequency; the field in scope is Last Modified:

BigQuery Datasets

The delivery of marts for user consumption live in datasets prefixed with mart. The newly surfaced dependency tables all live together in mart_helper. The distribution marts live in the udp_distributions dataset.

The final, use-case driven marts live in the other mart datasets:

  • mart_general

  • mart_course_section

  • mart_course_offering

  • mart_taskforce

  • mart_student_activity_score

These mart_* datasets are unchanged from the previous mart infrastructure to the new one. The table schemas, field names, datatypes, and everything else are unchanged, so existing scripts and applications do not break. The only technical difference in these datasets is that the assets inside the datasets are now native tables instead of materialized views. The previous infrastructure favored surfacing materialized views since the computation and construction all occurred in an internal Unizin GCP project. However, this is no longer a requirement, so native BQ tables are preferred for mart presentation.

mart_helper Deep Dive

The newly visible dataset in prod BQ environments is called mart_helper. The tables in here are all the accumulated dependencies that build the final tables in the other mart_ datasets. Since all the dependencies are compressed into one hierarchical level, a consistent naming convention is enforced on each asset in the mart_helper dataset. Double underscore characters are used as delimiters.

First Substring: context, event, or utility

The first part of a mart_helper table name is either “context,” “event,” or “utility.” These strings dictate the base data source on which the helper table depends:

  • Context - the data source is the UDP context_store, governed by the UCDM

  • Event - the data source is the UDP event_store, governed by the Caliper standard

  • Utility - the data source is static mappings for tool names and such. This is the only part of the udp_marts that is manually compiled and maintained. However, these data will rarely (if ever) update.

Second Substring: Scope

The middle substring can be vague by looking at the tables in BQ, but its original purpose is as an organizational pointer to where SQL code lives in the udp_marts code repository. Values for pivot scope could include “base,” “course_offering,” “course_section”, “taskforce,” “general,” or "student_activity_score." Any table with “context__base” will have its SQL definition in the context/base/ directory in the Gitlab repository.

Analytically, there is a loose interpretation of this substring, but we don’t recommend focusing on these reasons too strictly:

  • Base - these tables are the first layer on top of raw UCDM data

  • Course_offering - a useful pivot in reporting for these tables may be per-course offering.

  • Course_section - same interpretation as course_offering but at the more granular section level.

  • Taskforce - these assets have this specific string to denote work directly related to the learning analytics taskforces in 2022.

  • General - No direct pivot is specified or favored; appropriate usage can include looking across courses, terms, cohorts, etc

  • Student_activity_score - these assets have this specific string to denote work related to the student activity score metric.

Third Substring: Asset Name

This names the scope and purpose of the table. Some values are more straightforward (such as person, learner_activity, course_grade, etc.), which point to a direct UCDM entity in scope for aggregations or denormalization. Other values are more use-case descriptive (such as interaction_sessions, last_activity, weekly_discussions, etc.). These tables likely have multiple dependencies on UCDM tables or Caliper events and are less tied to a single entity in scope. Instead, these tables likely answer a targeted analytics question.

Code Repository and Visualized Dependencies

The directories in scope for viewing SQL logic defining the marts are the following:

  • Context - all helper tables that have UDP context_store dependencies

  • Distribution - tables that define categorical and continuous distributions of data in the UDP context_store

  • Event - all helper tables that have UDP event_store dependencies

  • Mart - highest level tables that combine both context and event dependencies together

The other Python scripts, directories, and configuration files are totally fine to see! However, their only purpose is internal to Unizin DevOps to maintain and deploy the marts at scale.

Interactive Dependency Charts

Below is an example of the mart_general.lms_tool dependency visualization:

Read left-to-right, this shows the transformation steps enforced from foundational UDP data to the final mart. The following color coding applies:

  • Blue - these correspond to SQL in the context directory of the repo; faded/light blue is a foundational table in the context_store. Light blue rectangles are not clickable; these are foundational UDP entities and not defined with SQL.

  • Red - these correspond to SQL in the event directory of the repo; faded/light red is the UDP’s foundational expanded table in BigQuery. Light red rectangles are not clickable; these are foundational UDP entities and not defined with SQL.

  • Green - these correspond to SQL in the utility directory of the repo

  • Gray - these are for all other queries. These may be referencing a BigQuery public dataset (e.g. bringing in census names instead of using real student names). Also, starter.sql files fall in this category. Starter files are helpful for partitioned tables, and these need to run once just to provision the table schemas in BigQuery correctly. No actual data get passed through these starter.sql files, however.

Clicking on a node in the dependency diagram will take users to the corresponding SQL file in the GitLab repository. Note that you may need to ctrl+click (for Windows) or cmd+click (for Mac) to open GitLab in a new tab.

A full list of charts can be found here.

Questions and Requests For Changes/Additions

All of the SQL code and deployment scripts used to build the marts are stored in this . For now, this repository is read-only and for logic reference purposes only; however, users are encouraged to against the repository for additions, bug fixes, and requests.

Each corresponding mart documentation page nested beneath this page on the resources site has an interactive dependency visualization (built using ) showing the sequence of helper tables built from foundational UDP data to the use-case driven marts.

.

Emailing is the best way to ask questions and request additions or changes to the marts. The Services and DSS team will follow up directly on each request.

Unizin Community Gitlab repository called Marts
create issues
Mermaid
Click here to open the interactive chart
mart_course_offering.file_interaction
mart_course_offering.status
mart_course_section.file_interaction
mart_course_section.status
mart_general.lms_tool
mart_general.lti_tool
mart_taskforce.course_profile
mart_taskforce.level1_aggregated
mart_taskforce.level2_aggregated
mart_taskforce.level2_course_week_files
mart_taskforce.level2_course_week_metrics
mart_taskforce.level2_course_week_tools
mart_taskforce.level2_course_weekly_distribution_summary
mart_taskforce.student_term_profile
mart_course_offering.interaction_sessions
mart_course_offering.last_activity
mart_course_offering.daily_course_grade_record
mart_course_section.interaction_sessions
mart_course_section.last_activity
mart_course_section.daily_course_grade_record
mart_student_activity_score.student_course_metrics
mart_student_activity_score.final
support@unizin.org
Image 1: Previous Mart Presentation in BigQuery
Image 2: Helpful Table Timestamp Information in BigQuery
mart_general.lms_tool