Unizin Product Documentation
ProductsSupport and TrainingPolicies
  • Unizin Product Documentation
  • Products
    • Content
      • Unizin Engage
        • eReader User Guide
          • Notes, Highlights, and Citations
          • Appearance Settings
          • Download for Offline
          • eReader Layout
          • Keyboard Shortcuts
          • Navigating Your eBook
          • Print
          • Text to Speech
          • Copy and Paste
          • Creating Flashcards
          • Collaboration and Note Sharing
          • Pearson Titles
        • Institution Support
          • Disabled Student Services / Alt-Format
            • Best Practices for Republishing Course Content
            • Disabled Student Services
            • Requesting eTextbook Files for Accessibility Purposes
            • WCAG 2.0 AA evaluation for Engage
            • WCAG 2.0 AA evaluation for EPUB for Engage
          • Institution's Support Responsibilities
        • Caliper 1.1 sensor
        • Release Notes
          • 2.28.22
          • 2019-09-17
          • 2019-05-29
          • 2.26.8
          • 2.26.0
          • 2.25.0
          • 2.22.0
          • 2.21.6
          • 2.21.5
          • 2.20.8
          • 2.20.5
          • 2.20.3
          • 2.19.1
          • 2.18.0
          • 2.17.0
          • 2.14.0
          • 2.12.0
          • 2.11.0
          • 2.9.0
          • 2.8.3
          • 2016-03-17
          • 2016-02-11
          • 2016-01-28
        • Using Analytics (New)
      • Unizin Order Tool
        • Overview of the User Interface
        • Key Concepts
          • Profiles
          • Ordering periods
          • Coordinator permissions
          • Program administrator permissions
        • Courses & Ordering
          • Course filtering
          • Place an order
          • Add sections to a placed order
          • Edit a placed order
          • Cancel an order
          • Reordering
        • Order History
          • Instructor Order History
          • Coordinator and Program Administrator Order History
          • Order Activity
        • Student Choice
          • Student Choice (Program Administrators)
          • Student Choice (Students)
        • Entitlements
          • Entitlements (Program Administrators)
          • Entitlements (Students)
        • Catalog Tool
        • Schedule of Classes
        • Content Request Tool
        • Order Tool Dashboard
        • Vendor Sandbox Tenant
        • Institution Support
        • Implementation
          • SIS Data Integration
            • 1.0 - SIS Integration
            • 2.0 - SIS Integration
          • SSO integration
          • UI customizations
          • Order Feed
            • 1.0 - Order Feed
            • 2.0 - Order Feed
            • 3.0 - Order Feed
          • Publisher report
          • Final declined offers feed
          • Institutional (SIS) Catalog Import
          • Student Price
          • Historical Entitlements Import
        • Release Notes
          • Order Tool Bug Fixes and Enhancements
          • Order Tool Bug Fixes
          • Order Tool Accessibility Improvements
          • Order Feed Improvements
          • Content Request Form Update and Minor Bug Fix
          • Flat Markup Fee Update
          • Ordering Email Receipt Update & Minor Bug Fix
          • Bug Fix for Public Catalog Feature
          • Catalog Search Enhancements
          • Reordering Reminder Email Notifications
          • UX Improvements & Minor Bug Fixes
          • Historical Entitlements Import
          • Student Prices
          • Reordering Feature
          • Email Enhancements
          • Ordering Enhancements
          • Bug Fix for the Institutional Catalog Import
          • Bug Fix for the Final Declined Offers Feed (FDOF)
          • Order Activity Feature and Other Enhancements
          • Bug Fixes for Order History and Report an Issue Features
          • Public catalog feature
          • Minor Bug Fixes for Ordering and Student Choice
          • Entitlements Production Release, Bug fixes, and Minor updates
          • Minor Updates and Bug Fixes for Ordering Workflows
          • Catalog Search Optimization
          • Student Choice
          • Archive Terms Feature and Integration Improvements
          • Introduces the Program Administrator role, Catalog Tool, and Schedule
          • User interface updates and improvements
          • Order feed improvements
          • Order history, UI enhancements
          • Email notification upgrades, UI improvements
          • Order feed changes
          • New features for Course coordinators and upgrades to the UI
          • Changes to the Term, Course, and Section models; introduces a Session
          • Bug fixes, import improvements, and validation improvements
          • Tracking Order History
          • Publisher Reporting
          • Fixes the order feed, automates SIS data importing, and automates the generation of order feed repor
    • Data & Analytics
      • Unizin Data Platform
        • Key concepts
          • Platform overview
          • Data categories
          • Data models
          • Loading schemas
          • Keymap
        • Unizin Common Data Model
          • Academic structures (ERD)
          • Learners (ERD)
          • Course structures (ERD)
          • Course resources (ERD)
          • Learner activities (ERD)
          • Quizzes (ERD)
          • Social (ERD)
          • Course outcomes (ERD)
        • System overview
          • Context data pipeline
            • Context data ingress
            • Batch-ingest application
            • Batch-ingest db server
            • Context store
          • Event data pipeline
            • UDP Caliper endpoint
            • Approval process for implementing Caliper compliant tools
            • UDP Event enricher
            • Event store
        • Data stores
          • Data lake
            • UDP Context store
            • UDP Event store
              • Accessing the Event store
              • Expanded table
                • Expanded table: Canvas edApp mapping
            • Synthetic Data [beta]
              • Viewing Synthetic Data datasets within the BigQuery UI
              • Query Synthetic Data via client libraries
          • Data marts
            • UDP Distributions
            • Interaction sessions
            • Learning Environment Organization
            • File Interaction
            • Last Activity
            • Long Inactivity
            • Course Status
            • Daily Course Grade Record
            • LTI Tool Use
            • LMS Tool Use
            • Tool Usage Metrics
            • Links
            • Taskforce
              • Level 1 Aggregated
              • Level 2 Aggregated
              • Level 2 Course Weekly Distribution Summary
              • Student Term Profile
              • Course Profile
            • Student Activity Score
              • Student Course Metrics
              • Student Course Section Metrics
              • Final
              • Course Final
              • Course Section Final
        • Data integrations
          • Context data integration
            • Loading schema
            • Keymap support
            • Manifest file
            • File requirements
            • Integration mechanics
          • Event data integration
          • SIS data integration
          • LMS data integration
            • Instructure Canvas
        • Release Notes
          • UDP Marts Release Notes
            • 1.0.83
            • 1.0.80
            • 1.0.79
            • 1.0.78
            • 1.0.77
            • 1.0.72
            • 1.0.67
            • 1.0.58
            • 1.0.51
            • 1.0.44
            • 1.0.42
            • 1.0.32
            • 1.0.31
            • 1.0.0
            • Level 2 Taskforce data marts now available
          • 2.0.167
          • 2.0.152
          • 2.0.138
          • 2.0.137
          • 2.0.113
          • 2.0.112
          • 2.0.111
          • 2.0.110
          • 2.0.99
          • 2.0.98
          • 2.0.83
          • 2.0.80
          • 2.0.71
          • 2.0.66
          • 2.0.59
          • 2.0.58
          • 2.0.53
          • 2.0.47
          • 2.0.25
        • Miscellaneous
          • Canvas Data additions, ~Fall 2021
          • Canvas Live Events: from SQS to HTTPS
          • Canvas New Analytics vs. UDP
          • Course Section Enrollment Role Status Mappings
          • Migrating from UDW to UDP
      • Unizin Data Warehouse
        • Implementation Guide
        • Scope of Services
        • Access Provisioning
        • Access Revocation
        • Connecting to the UDW
      • Raw Canvas Data 2
        • Flat Files
        • BigQuery Datasets
    • Hosted Services
      • My Learning Analytics
        • Install MyLA via LTI 1.3
        • Custom configure MyLA
  • Support and Training
    • Professional Development
      • Stepping Stones: A Faculty Development Curriculum for Learning Analytics Use
      • Structured Conversations initiative
    • UDP Self-paced Training
    • Resources Site Broken Links
    • Status Pages
  • Policies
    • General policies
      • Sponsor Teams
      • Browser Support Policy
      • Opt-Out & Invoicing Policy (Order Tool)
    • Support Policy
      • Unizin Engage - SP
      • Unizin Order Tool - SP
      • Unizin Data Platform - SP
      • Unizin Data Warehouse - SP
      • Unizin Data Analysis - SP
      • Pressbooks Hosting - SP
    • Privacy Policy
      • Unizin Engage - PP
      • Unizin Order Tool - PP
      • Unizin Data Platform - PP
      • RStudio service - PP
    • End User License Agreements
      • Unizin Engage - EULA
      • Unizin Order Tool - EULA
    • Terms of Use
      • Unizin Data Platform - ToU
    • Incident Reports
Powered by GitBook
LogoLogo

Unizin Homepage

  • unizin.org

Data & Analytics

  • Unizin Data Platform
  • Unizin Data Warehouse

Content

  • Unizin Engage
  • Unizin Order Tool

Hosted Services

  • My Learning Analytics

Copyright © 2023, Unizin, Ltd.

On this page
  • BQ Prod Dataset Locations
  • Interactive Mart Dependency Diagram
  • Schema
  • mart/general/links
  • Javascript Usage
  • Content Sources
  • Link IDs
  • References
  1. Products
  2. Data & Analytics
  3. Unizin Data Platform
  4. Data stores
  5. Data marts

Links

PreviousTool Usage MetricsNextTaskforce

Last updated 3 days ago

The links mart presents extracted links from content in the LMS. This mart is based on the 'Links Datamart' developed by Jason Heffner at Pennsylvania State University. This mart employs the Javascript libraries and . The cheerio.js library is used to parse the HTML of the content. URLs included in the HTML are extracted, as well as attributes and contextual information about the HTML element. The URI.js library is used to parse the URL, extracting components such as the host or path of the URL.

BQ Prod Dataset Locations

  • mart_general

Interactive Mart Dependency Diagram

The following visualization shows the construction of this data mart defined in the repository. More information on the repository and diagram can be found on .

Schema

mart/general/links

Field name
Type
Description

udp_link_id

STRING

An unique ID of the link, generated by combining the UDP content ID, the occurrence number, the content type, and the UDP course offering ID.

lms_link_id

STRING

An unique ID of the link, generated by combining the LMS content ID, the occurrence number, the content type, and the LMS course offering ID.

udp_content_id

INTEGER

The UDP ID of the content that contains the link.

lms_content_id

STRING

The LMS ID of the content that contains the link.

udp_parent_content_id

INTEGER

The UDP ID of the parent of the content. May be null if the content does not have a parent.

lms_parent_content_id

STRING

The LMS ID of the parent of the content. May be null if the content does not have a parent.

udp_course_offering_id

INTEGER

The UDP ID of the course offering.

lms_course_offering_id

STRING

The LMS ID of the course offering.

sis_course_offering_id

STRING

The SIS ID of the course offering.

udp_person_id

INTEGER

The UDP ID of the person if a person is associated with the content.

lms_person_id

STRING

The LMS ID of the person if a person is associated with the content.

sis_person_id

STRING

The SIS ID of the person if a person is associated with the content.

content_type

STRING

The type of the content associated with the link, i.e. 'learner_activity', 'discussion', etc.

parent_content_type

STRING

The type of the parent content, i.e. 'learner_activity', 'quiz', etc. May be null if the content does not have a parent.

content

STRING

The full HTML text of the content.

content_name

STRING

The name or title of the content.

status

STRING

The status of the content, i.e. 'active', 'deleted', 'published', etc.

is_active

BOOLEAN

A boolean field indicating if the status of the content is 'active', 'published', 'available', or 'post_delayed'.

created_date

DATETIME

The date the content was created.

updated_date

DATETIME

The date the content was last updated.

url

STRING

The extracted URL from the content.

context

STRING

The HTML element that contains the URL.

word_count

INTEGER

The number of words in the text content of the HTML element.

char_count

INTEGER

The number of characters in the text content of the HTML element.

usage

STRING

How the link is used in the content, i.e. 'hyperlink', 'embed', 'image', etc.

scheme

STRING

The scheme of the URL, i.e. 'http' or 'https'.

host

STRING

The host or domain name of the URL, ie. 'example.com'.

path

STRING

The path of the URL, i.e. 'path/to/resource'.

query_string

STRING

The query or search string of the URL, i.e. 'key=value&anotherKey=anotherValue'.

file_extension

STRING

If the link contains a file name, the extension of the file, i.e. 'html' or 'pdf'.

tag

STRING

The tag of the HTML element, i.e. 'a', 'iframe', 'embed', etc.

attribute

`STRING

The attribute of the HTML element, i.e. 'src' or 'href'.

classes

STRING

The class of the HTML element.

is_safe_link

BOOLEAN

A boolean field that indicates if the URL is a safe link.

is_canvas

BOOLEAN

A boolean field that indicates if the URL is a Canvas URL.

is_shortener

BOOLEAN

A boolean field that indicates if the URL is a shortened URL.

occurrence

INTEGER

The occurrence number of the link in the content.

service_name

STRING

The name of the service linked to in the URL.

Javascript Usage

  • "gs://assets.public.unizin.org/udp-marts/links-datamart/cheerio.js"

  • "gs://assets.public.unizin.org/udp-marts/links-datamart/uri.js"

Using BigQuery SQL, the Javascript UDF is called on content from the LMS in order to extract links and additional contextual information.

Content Sources

This mart extracts URLs from content in the LMS. The content is defined as any HTML text fields associated with tools in the LMS. The content types are defined based on the LMS tool. The possible content types and the corresponding UDP fields of the content are:

  • annotation

    • annotation.message

  • conversation_message

    • conversation_message.body

  • discussion

    • discussion.body

  • discussion_entry

    • discussion_entry.body

  • learner_activity

    • learner_activity.description

  • learner_activity_result

    • learner_activity_result.body

  • learning_outcome

    • learning_outcome.description

  • learning_outcome_group

    • learning_outcome_group.description

  • learning_outcome_rubric_criteria

    • learning_outcome_rubric_criteria.description

  • module_item

    • module_item.url

  • quiz

    • quiz.description

  • quiz_item

    • quiz_item.body

  • quiz_item_response

    • quiz_item_response.body

  • syllabus

    • course_offering.syllabus_content

  • wiki

    • wiki.front_page_url

  • wiki_page

    • wiki_page.body

Link IDs

The udp_link_id and lms_link_id fields do not correspond to any existing IDs in the UDP or Canvas. They are unique identifiers generated for the links and are defined based on other fields included in this mart. The udp_link_id field is defined as:

  CONCAT(CAST(cu.udp_content_id AS STRING),'-',CAST(cu.occurrence AS STRING),'-',cu.content_type,'-',cu.udp_course_offering_id) as udp_link_id,

or the combination of udp_content_id, occurrence, content_type, and udp_course_offering_id. Similarly, lms_link_id is defined as the combination of lms_content_id, occurrence, content_type, and lms_course_offering_id.

References

As mentioned in the introduction, this mart employs Javascript libraries to extract information from HTML and URLs. This is done using the functionality in BigQuery. The OPTIONS section in the UDF references the Javascript libraries. The libraries are publicly available resources:

- Javascript library used to parse and extract elements of HTML.

- Javascript library used to parse and extract elements of URLs.

- Project with list of common URL shortener domains used to define the is_shortener field.

- Project with list of common file extensions used to define the file_extension field.

Javascript UDFs
Cheerio.js
URI.js
url-shorteners
file-extensions-list
cheerio.js
URI.js
UDP marts
this marts page
Click here to open the interactive chart.
mart_general.links