Links
Last updated
Last updated
The links mart presents extracted links from content in the LMS. This mart is based on the 'Links Datamart' developed by Jason Heffner at Pennsylvania State University. This mart employs the Javascript libraries and . The cheerio.js library is used to parse the HTML of the content. URLs included in the HTML are extracted, as well as attributes and contextual information about the HTML element. The URI.js library is used to parse the URL, extracting components such as the host or path of the URL.
mart_general
The following visualization shows the construction of this data mart defined in the repository. More information on the repository and diagram can be found on .
udp_link_id
STRING
An unique ID of the link, generated by combining the UDP content ID, the occurrence number, the content type, and the UDP course offering ID.
lms_link_id
STRING
An unique ID of the link, generated by combining the LMS content ID, the occurrence number, the content type, and the LMS course offering ID.
udp_content_id
INTEGER
The UDP ID of the content that contains the link.
lms_content_id
STRING
The LMS ID of the content that contains the link.
udp_parent_content_id
INTEGER
The UDP ID of the parent of the content. May be null if the content does not have a parent.
lms_parent_content_id
STRING
The LMS ID of the parent of the content. May be null if the content does not have a parent.
udp_course_offering_id
INTEGER
The UDP ID of the course offering.
lms_course_offering_id
STRING
The LMS ID of the course offering.
sis_course_offering_id
STRING
The SIS ID of the course offering.
udp_person_id
INTEGER
The UDP ID of the person if a person is associated with the content.
lms_person_id
STRING
The LMS ID of the person if a person is associated with the content.
sis_person_id
STRING
The SIS ID of the person if a person is associated with the content.
content_type
STRING
The type of the content associated with the link, i.e. 'learner_activity', 'discussion', etc.
parent_content_type
STRING
The type of the parent content, i.e. 'learner_activity', 'quiz', etc. May be null if the content does not have a parent.
content
STRING
The full HTML text of the content.
content_name
STRING
The name or title of the content.
status
STRING
The status of the content, i.e. 'active', 'deleted', 'published', etc.
is_active
BOOLEAN
A boolean field indicating if the status of the content is 'active', 'published', 'available', or 'post_delayed'.
created_date
DATETIME
The date the content was created.
updated_date
DATETIME
The date the content was last updated.
url
STRING
The extracted URL from the content.
context
STRING
The HTML element that contains the URL.
word_count
INTEGER
The number of words in the text content of the HTML element.
char_count
INTEGER
The number of characters in the text content of the HTML element.
usage
STRING
How the link is used in the content, i.e. 'hyperlink', 'embed', 'image', etc.
scheme
STRING
The scheme of the URL, i.e. 'http' or 'https'.
host
STRING
The host or domain name of the URL, ie. 'example.com'.
path
STRING
The path of the URL, i.e. 'path/to/resource'.
query_string
STRING
The query or search string of the URL, i.e. 'key=value&anotherKey=anotherValue'.
file_extension
STRING
If the link contains a file name, the extension of the file, i.e. 'html' or 'pdf'.
tag
STRING
The tag of the HTML element, i.e. 'a', 'iframe', 'embed', etc.
attribute
`STRING
The attribute of the HTML element, i.e. 'src' or 'href'.
classes
STRING
The class of the HTML element.
is_safe_link
BOOLEAN
A boolean field that indicates if the URL is a safe link.
is_canvas
BOOLEAN
A boolean field that indicates if the URL is a Canvas URL.
is_shortener
BOOLEAN
A boolean field that indicates if the URL is a shortened URL.
occurrence
INTEGER
The occurrence number of the link in the content.
service_name
STRING
The name of the service linked to in the URL.
"gs://assets.public.unizin.org/udp-marts/links-datamart/cheerio.js"
"gs://assets.public.unizin.org/udp-marts/links-datamart/uri.js"
Using BigQuery SQL, the Javascript UDF is called on content from the LMS in order to extract links and additional contextual information.
This mart extracts URLs from content in the LMS. The content is defined as any HTML text fields associated with tools in the LMS. The content types are defined based on the LMS tool. The possible content types and the corresponding UDP fields of the content are:
annotation
annotation.message
conversation_message
conversation_message.body
discussion
discussion.body
discussion_entry
discussion_entry.body
learner_activity
learner_activity.description
learner_activity_result
learner_activity_result.body
learning_outcome
learning_outcome.description
learning_outcome_group
learning_outcome_group.description
learning_outcome_rubric_criteria
learning_outcome_rubric_criteria.description
module_item
module_item.url
quiz
quiz.description
quiz_item
quiz_item.body
quiz_item_response
quiz_item_response.body
syllabus
course_offering.syllabus_content
wiki
wiki.front_page_url
wiki_page
wiki_page.body
The udp_link_id and lms_link_id fields do not correspond to any existing IDs in the UDP or Canvas. They are unique identifiers generated for the links and are defined based on other fields included in this mart. The udp_link_id field is defined as:
or the combination of udp_content_id, occurrence, content_type, and udp_course_offering_id. Similarly, lms_link_id is defined as the combination of lms_content_id, occurrence, content_type, and lms_course_offering_id.
As mentioned in the introduction, this mart employs Javascript libraries to extract information from HTML and URLs. This is done using the functionality in BigQuery. The OPTIONS
section in the UDF references the Javascript libraries. The libraries are publicly available resources:
- Javascript library used to parse and extract elements of HTML.
- Javascript library used to parse and extract elements of URLs.
- Project with list of common URL shortener domains used to define the is_shortener field.
- Project with list of common file extensions used to define the file_extension field.