Manifest file

A manifest file is an important part of every context dataset pushed to the UDP. It describes the contents of a full context dataset in such a way that the UDP can validate the completeness and integrity of the dataset before importing it into a UDP instance.

Contents

A manifest file is a YAML file whose keys enable the UDP to ensure that a complete dataset is available and valid prior to ingestion.

Here is an example of the recommended version 2 (v2) manifest file used for SIS loading schema:

Manifest File Version 2

# The version of the manifest file used
# in this dataset.
manifest_version: "v2"

# The name of the system whose data is ingested,
# where values may include 'peoplesoft', 'banner',
# 'canvas-data', etc. depending on the application.
source: "my_sis"

# The UCDM version number this dataset's format
# corresponds to.
data_schema: "2.0"

# The ISO 8601 UTC datetime stamp describing how current this dump is.
# In most cases, simply providing when the dataset was produced works well.
datetime: "2021-12-10T19:11:23Z"

# A unique identifier for the dataset, such as a UUID
dump_id: "b4f8eec7-7adc-47a1-83a4-238f1032da00"

# An object of the files included in the dataset.
# For each file, an MD5 checksum is provided.
files:
  academic_term: 9c9e18230ff048f2837889f41e1faba2
  course_offering: 64388cb12350d966fbdc37b9e6c02014
  course_section: 4388cb129e18230ff048f2831e1fa14

files:
  - academic_term: 9c9e18230ff048f2837889f41e1faba2
  - course_offering: 64388cb12350d966fbdc37b9e6c02014
  - course_section: 4388cb129e18230ff048f2831e1fa14   
...

Note: Both formats for the files section as listed above are supported for manifest v2 in batch ingest. This only applies to manifest v2.

The complete set of keys that must be provided in a manifest file are:

KeyDefinition

manifest_version

The version of the manifest file format used in this manifest file. New implementations should use "v2".

source

The name of the system from which data is generated to conform with a UDP loading schema. For example, "sis" is the source value for the SIS Loading schema.

data_schema

The version of the UDP loading schema to which this dataset conforms. New implementations should use "2.0".

datetime

The ISO 8601 UTC datetime describing when the dataset was produced.

dump_id

A unique identifier for this particular dataset dump.

files

An object with keys for each provided loading schema entities (such as person), with values of the MD5 hash (a checksum) of each corresponding CSV file (such as person.csv).

Prior to import, the UDP will compute the MD5 hash for each entity's file indicated in the manifest file. If they match, then the files have been transported accurately and import can begin.

File Name Requirements

You may name the manifest file however you like, so long as it's a valid Google Cloud Storage object name (basically meaning "a typical file name"), and ends in ".done".

Accompanying CSV data files should be named for their associated loading schema entity, with a ".csv" suffix.

The filenames of manifest files may reference the following naming conventions:

Version 2

In a second example, suppose that you are producing a dataset that conforms to the SIS Loading schema. The loading schema name to be used is sis. It is not necessary to provide a date if the date suffix from your .csv file names is removed because the manifest defines the datetime. Therefore, a valid filename for the UDP loading schema in this case is:\

<source>_daily.done

sis_daily.done

Version 1

In these formulas, the source value will be identical to the source value in the manifest file. For example, suppose that you are producing a dataset that conforms to the SIS Loading schema. The loading schema name to be used is sis. The date will be in ISO 8601 format and refer to the date only; e.g., 2022-01-01. Using the preceding example, a valid filename for the UDP loading schema is:

<source>_daily_<date>.done

sis_daily_2022-01-11.done

Manifest File Version 1 (No longer implemented)

Here is an example of version 1 (v1) of a manifest file used for SIS loading schema. This version is no longer recommended or implemented, and will eventually be removed.

Note in version 1 the files attribute is an array, where each institution must specify the full name of the associated CSV file along with MD5 checksum. In addition, each CSV file referenced is expected to include the current date in its name. These rules were simplified for version 2.

# The version of the manifest file used
# in this dataset.
manifest_version: "v1"

# The name of the system whose data is ingested,
# where values may include 'peoplesoft', 'banner',
# 'canvas-data', etc. depending on the application.
source: "my_sis"

# The UCDM version number this dataset's format
# corresponds to.
data_schema: "2.0"

## The ISO 8601 UTC datetime stamp describing how current this dump is.
## In most cases, simply providing when the dataset was produced works well.
datetime: "2021-12-10T19:11:23Z"

## A unique identifier for the dataset, such as a UUID
dump_id: "b4f8eec7-7adc-47a1-83a4-238f1032da00"

## An array of the files included in the dataset.
## For each file, an MD5 checksum is provided.
files:
- name: "academic_term_2019-02-04.csv"
  checksum: "9c9e18230ff048f2837889f41e1faba2"
- name: "course_offering_2019-02-04.csv"
  checksum: "64388cb12350d966fbdc37b9e6c02014"
- name: "course_section_2019-02-04.csv"
...

Last updated

Logo

Copyright © 2023, Unizin, Ltd.