Manifest file
A manifest file is an important part of every context dataset pushed to the UDP. It describes the contents of a full context dataset in such a way that the UDP can validate the completeness and integrity of the dataset before importing it into a UDP instance.
Contents
A manifest file is a YAML file whose keys enable the UDP to ensure that a complete dataset is available and valid prior to ingestion.
Here is an example of the recommended version 2 (v2) manifest file used for SIS loading schema:
Manifest File Version 2
Note: Both formats for the files section as listed above are supported for manifest v2 in batch ingest. This only applies to manifest v2.
The complete set of keys that must be provided in a manifest file are:
manifest_version
The version of the manifest file format used in this manifest file. New implementations should use "v2".
source
The name of the system from which data is generated to conform with a UDP loading schema. For example, "sis" is the source value for the SIS Loading schema.
data_schema
The version of the UDP loading schema to which this dataset conforms. New implementations should use "2.0".
datetime
The ISO 8601 UTC datetime describing when the dataset was produced.
dump_id
A unique identifier for this particular dataset dump.
files
An object with keys for each provided loading schema entities (such as person
), with values of the MD5 hash (a checksum) of each corresponding CSV file (such as person.csv
).
Prior to import, the UDP will compute the MD5 hash for each entity's file indicated in the manifest file. If they match, then the files have been transported accurately and import can begin.
File Name Requirements
You may name the manifest file however you like, so long as it's a valid Google Cloud Storage object name (basically meaning "a typical file name"), and ends in ".done".
Accompanying CSV data files should be named for their associated loading schema entity, with a ".csv" suffix.
The filenames of manifest files may reference the following naming conventions:
Version 2
In a second example, suppose that you are producing a dataset that conforms to the SIS Loading schema. The loading schema name to be used is sis
. It is not necessary to provide a date if the date suffix from your .csv file names is removed because the manifest defines the datetime.
Therefore, a valid filename for the UDP loading schema in this case is:\
<source>_daily.done
sis_daily.done
Version 1
In these formulas, the source
value will be identical to the source
value in the manifest file. For example, suppose that you are producing a dataset that conforms to the SIS Loading schema. The loading schema name to be used is sis
. The date will be in ISO 8601 format and refer to the date only; e.g., 2022-01-01. Using the preceding example, a valid filename for the UDP loading schema is:
<source>_daily_<date>.done
sis_daily_2022-01-11.done
Manifest File Version 1 (No longer implemented)
Here is an example of version 1 (v1) of a manifest file used for SIS loading schema. This version is no longer recommended or implemented, and will eventually be removed.
Note in version 1 the files
attribute is an array, where each institution must specify the full name of the associated CSV file along with MD5 checksum. In addition, each CSV file referenced is expected to include the current date in its name. These rules were simplified for version 2.
Last updated